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REVERSALS IN READING AND WRITING 


GERTRUDE HILDRETH 
Lincoln School of Teachers College 


In the early stages of learning children frequently make reversals 
in reading and writing. A reversal consists of the lateral inversion 
of word elements, written symbols, or the reading or writing of mate- 
rial in sinistral rather than dextral sequence. Many varieties of 
reversals of which the following are typical examples are observed in 
the oral reading and in the written work of young children: 

In writing 

1. The reversal of single letters or numbers VU for N or 0 for 
6. 

2. Reversal of the order of several letters or digits, but not of the 
individual elements themselves, 93 for 39 or ON for NO. 

3. The occasional reversal of letters and numbers composing 
the word or number series as well as in the series as a whole, HNAJ for 
JANE. 

4. Complete mirror writing in which the material as a whole is 
oriented in a right to left direction, and in which all separate elements 
are laterally inverted. Illustrations of such writing are given by 
Gordon,'!° Lord, Carmichael and Dearborn!* and Burt.? 

In reading, reversals are primarily of two types: 

1. The pronunciation of single words, or sound elements as they 
would be pronounced if written in inverse order or as though individual 
letters were inverted, as for example, ‘“‘on” for “no,” “bread” for 
“bear” or “big” for “dig.” 

2. Inversion of the order of words in a phrase or sentence, as for 
example “‘ Kitty see I”’ for “‘I see a kitty.” 

Most studies of the reversal phenomenon have been confined to 
the more severe clinical cases in which mirror writing appears quite 
frequently and mirror reading occasionally. In recent years interest 
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in the subject has greatly increased as the result of more general 
study of school children in clinics, and the increased number of 
published reports on the subject. Illustrations of cases of reversals 
and explanatory theories are contained in recent publications of 
Orton” Monroe,'* Dearborn.®? With the exception of a recent 
study by Gates® there has been little general study of the reversal 
tendency in large numbers of unselected children of elementary school 
age, though several studies report investigation of reversals in kinder- 
garten or beginning first grade children. 

Writing reversals of young children before the period of formal 
school training were investigated by Hildreth,'! by means of simple 
copying and perception tests given to entire kindergarten and first 
grade classes. Descoeudres* reported dextral-sinistral orientation 
tendencies in young children as contrasted with older children. 
Teegarden*! through matching and copying tests investigated reversal 
tendencies in first grade children. Smith reports the direction 
kindergarten and first grade children follow in naming pictures shown 
them, and Davidson’ reports the results of perception-reversal tests 
given to unselected kindergarten and first grade children. Detailed 
results of the various studies cannot be given here, but all point to 
the frequency of the tendency in normal young children and the decline 
of the tendency with gain in mental maturity and experience. 

Correct left to right orientation required in English reading and 
writing is the product of practice and experience. The examination 
of the writing of young children produced as the result of the child’s 
own spontaneous interest before any formal writing instruction has 
been given often illustrates the child’s confused perception with 
regard to lateral orientation, to say nothing of writing up-side-down, 
or items incorrectly formed. Three illustrations of such writing 
follow: 

Age IQ 
Robert 5-0 133 
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Samuel 4-0 142 
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Paul 5-4 1 


ace 


Similar cases are cited by Burt? and Stern.?° 

The apparent relation between reversals and reading difficulties 
cited by a number of investigators naturally becomes the concern 
of teachers and educators as well as of child specialists and clinical 
workers. Because of the importance of the subject to education 
and the somewhat confusing reports of studies based chiefly on clinical 
cases, it seemed timely to undertake a comprehensive study using 
unselected class groups, of the tendency for elementary school children 
in public and private schools to make reversals in reading and writing. 
Objective, uniform tests were used in obtaining data. 


TOPICS OF INVESTIGATION 


The study as carried out furnishes answers to the following ques- 
tions, so far as the particular groups studied were concerned: (1) With 
what frequency are reversals made in reading and writing at different 
grade levels? (2) What is the consistency of the reversal tendency 
from test to test in the reading or writing of individual pupils? (3) 
What types of items are most frequently reversed? (4) What is the 
proportion of reversal errors to total errors made? (5) What is the 
relation of the tendency to make reversals to reading success? (6) 
What is the relation of the tendency to maturity? toexperience? (7) 
How do public and private school pupils of the same grade levels 
compare in reversal tendency? (8) Do children who have practice 
in both English and Hebrew reading and writing show more or less 
confusion than children who read and write only English? (9) What 
is the relation of dominant left-handedness to the tendency to make 
reversals in reading and writing? 


THE TESTS AND THE SUBJECTS 


Two sets of tests were given to entire classes of primary grade 
children in three types of schools. Because of limitations of time 
it was not always possible to use the same test series consistently 
in the three types of schools, but the testing was adequate enough 
to insure reliable and not merely chance results. The reading tests 
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consisted of diagnostic reading materials, and yielded reversal data 
such as one would ordinarily observe in the child’s normal oral read- 
ing. The reading tests contained a word pronunciation test of fifty 
words of graded difficulty, three oral reading paragraphs ranging in 
difficulty from first grade to fourth or fifth grade level, and a set of 
flash cards exposed for uniform lengths of time but containing increased 
amounts of material of uniform difficulty. 

Samples from the word pronunciation test are (1) she, (2) no, 
(3) pig, (5) how, (9) bear, (15) grass, (17) which:, (20) paint, (23) 
mountain, (29) wrapper, (34) monarch, (36) chagrin, (40) jubilant, (46) 
remonstrance. 

The first two sentences from the second oral reading paragraph 
follow: One morning when Billy was on his way to school he looked down 
the street and saw clouds of smoke rolling up to the sky. He knew that 
there must be a big fire down town. 

The administration of these tests to pupils individually enabled 
the examiner to record all errors and omissions as well as the time 
required. In the word pronunciation test the child’s response to 
each word was recorded as accurately as possible. 

In order to test reversals made in writing and copying a perception- 
copying test was devised consisting of a series of flash cards containing 
the following material, one item or series appearing on each card: 


1. D 6. <— 11. bed 16. 579 21. from 

2.N 7. 9 12. JCE 17. trap 22. dear 

3. S 8. || 13. was 18. able 23. toward 
4.Z 9. 36 14. dim 19. (@4 24. corn bread 
5. 24 10. on 15. how 20. look 25. four and six 


In taking the test the child had before him a sheet of ruled writing 
paper. Each card was exposed for three seconds, and pupils were 
instructed as soon as the card was taken away to copy what they had 
seen. The material selected was for the most part well within the 
maturity level of the children tested and the test proved to be easy for 
all but a few of the youngest second grade children. 

The entire set of reading tests was given to all third and fourth 
grade pupils in one public school in a large city, and part of the test 
series was given in the second grade of the same school. The same 
tests were given in the second grade of a private school enrolling pupils 
of better than average ability. The perception-copying tests were 
given later in the year to all second, third and fourth grade children 
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in the same public school, as well as to all pupils of the same grade 
levels in the same private school, and in addition, to the same grades 
in a private Hebrew-English school, where children begin the reading 
and writing of Hebrew in the first grade several months after they 
begin reading and writing English. 

In addition to these tests the Kuhlmann-Anderson Intelligence 
Examination was given to the public school pupils and Stanford-Binet 
records were available for all other children. While the children were 
being examined, observation was made of preferred handedness and 
teachers were asked to furnish lists of all pupils who were dominately 
left-handed, or ambidextrous. Such evidence revealed the most 
consistently left-handed or ambidextrous children. The total score 
on the entire series of diagnostic reading tests, including a number 
not described here, was used as the measure of reading success. The 
chronological age of each child was obtained and in a few tnusual 
problem cases some information concerning the child’s developmental 
history, home background and former school experiences. 

The median intelligence quotients of all pupils in the three schools 
were: Public school, one hundred; private school, one hundred nine- 
teen; Hebrew school, one hundred sixteen. 

The number of boys and girls in all three types of schools was 
fairly equally divided. ‘The private school pupils were for the most 
part slightly younger than the public school pupils of the same grade 
levels. In the “‘English”’ private school there is less formal teaching 
of skills, and less time given to reading, writing and spelling in the 
first three grades than in the public school. With respect to skills- 
teaching the Hebrew-English school falls between the other two. In 
the private schools each child has advanced at the rate of a grade a 
year since admission to first grade, but in the public school there were 
‘“‘repeaters’”’ of one or more semesters in each grade. 

Curves of progress in the two types of tests constructed from the 
median scores of successive grades in the different schools are not 
learning curves in the strict sense of the term. For the construction 
of true learning curves repeated measures of the same individuals 
over a period of time would be required and such repeated measure- 
ments have not yet been made. But since the character of the 
population of the three types of schools is quite constant from year 
to year and from grade to grade, the medians computed from tests 
given to entire class groups within any one school give a fairly accurate 
picture of group growth from year to year. 
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THE FREQUENCY OF REVERSALS 


Reversals in Reading Tests—In computing the number of reversals 
in the reading tests it was necessary first to decide what constituted a 
reversal. The most clear cut cases were ‘‘no” for “‘on” and ‘“‘ was” for 
“saw.” There were a number of instances of “through” for “‘thor- 
ough’ but this error was counted as an error of omission or substitution 
rather than a reversal error. ‘‘Who” for “how” was counted as a 
reversal as well as “‘chargin”’ for “‘chagrin”’ though in these cases the 


REVERSALS IN WORD PRONUNCIATION 
Pustiic ScHoo. 








Cases Cases 

8 ee RO” iis inounccs 6 eh Oe Be nn cansoows 7 

“‘warper”’ for ‘‘wrapper”’ 1 “‘bread”’ for “‘bear’’..... 2 

“‘arper”’ for ‘‘wrapper’’.. 1 “‘who” for “how”’...... 1 

“‘war” for ““wrapper’’... 1 “follow” for ‘‘flower”’... 1 
“bread” for “‘bear’’..... 1 
“‘read”’ for “‘bear”...... 1 

3A | ‘‘bread”’ for “‘bear’’..... 1 3B | “‘who”’ for “‘how”...... 1 

I oa i wn ecape - “follow” for “‘flower’’... 1 

“chargin”’ for “chagrin”’ 3 “spe” for “lapse”...... 1 

= WO UN dos cae een 1 

“‘chargin”’ for “‘chagrin”’ 4 

“place” for “‘lapse’’..... 1 

ee ee as no 6-0 we 1 4B | “chargin”’ for “‘chagrin”’ 1 
“‘chargin” for “‘chagrin” 6 




















Total number of children examined 2A, 40; 2B, 38; 3A, 30; 3B, 36; 4A, 40; 
4B, 37. 


reversals are not of the entire configuration. It would certainly be 
far fetched to consider ‘‘ back” as a reversal for ‘“‘down” though the 
first letter of the word may have been inverted by the child. Some of 
the apparent reversals may be only substitutions of words for the 
correct pronunciation that accidentally contain some reversed element 
of the correct word. ‘‘Or’ for “‘to”’ is sometimes a logical and sensible 
substitution, and may possibly be interpreted as such rather than as 
a reversal. Since we are unable to see into the child’s mind to dis- 
cover what his mental processes are when observing words, some 
arbitrary criterion must be employed for determining the number of 
reversal errors made. In the present instance the criteria for determin- 
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ing reversal errors was very liberal. Practically any word that was 
pronounced as though the entire word or the largest share of it was 
spelled in inverted lateral order was counted as a reversal even though 
the letters were not arranged in perfect inverted order. ‘‘Who”’ for 
“how” is an example. 

In spite of this broad interpretation of reading errors as reversals 
the total number of reversals made by the children was surprisingly 
small in proportion to the total number of ail kinds of errors made by 
each child. 


PrIvaTE ScHOOL 








Cases 
I, Ta oak, os pene agédeasedeeaens 6 
ne ow a iw oon ae 6eadre cine 1 
~ seh SR eH 1 
i ccs cance bheskv iene 1 
ee on ooo cccedpebesto wea 2 











Total number of cases examined, 37. 


The number of reversals made in reading was not only relatively 
small but the reversal errors are scattered quite evenly among the 
children who tend to make them. In public school groups two children 
made three reversals, eighteen children made two reading reversal 
errors and the remaining thirty-seven children who made reversals 
made one each. One child made one reversal error in a single word 
and twice reversed the order of words in sentences. 

In the private school second grade one child made two reversal 
errors and nine children made one error each. 

The relation of reversal errors to all other types of errors, sub- 
stitutions, omissions, and partially correct responses is shown in the 
following table: 




















Grade Total errors | Reversalerrors| No. of pupils 

2A 1481 11 40 

2B 1205 11 37 

3A 710 5 30 

3B 792 se) 36 

4A 799 7 40 

4B 600 1 37 

Private school second grade... . 1090 11 37 
G 7 
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The list of reversal errors divides quite naturally into two types: 
Words that are identical or practically so when reversed, containing 
the same letters and the same number of letters but in reversed order 
as “‘no”’ for “‘on’’; and words which children at appropriate grade 
levels ordinarily have difficulty in recognizing because of their com- 
plexity or infrequency. Examples are: lapse, chagrin, wrapper, bread. 


REVERSALS IN PARAGRAPH READING 
Pusuic SCHOOLS 



































pee 
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Cases Cases 
et ee 1 
EE ok tn nd domes 1 2B | “or” for “‘round’”’....... 1 
“‘ever” for “have’’...... 1 “‘made”’ for “‘named’’... 1 
[a EE os asecse ccd 3 
3A | “‘made”’ for ‘‘named’’... 1 
“‘spare”’ for ‘‘spread’’. . . 1 Be ef he a 2 
“star” for “short”’...... 1 TE Be TO wncwaceses 2 
‘“‘was” for “‘saw”....... 1 << « « yr 1 
One child also reversed “turn” for “truck’”’..... 1 
the order of words in his ‘“‘was” for “saw’’....... 1 
reading but did not ‘‘made” for “named’’... 1 
make reversals in indi- “sperd” for “‘spread’’.. . 1 
vidual words. . = - are 1 
One child also inverted 
the order of words read 
in one row, reading 
instead of ‘‘our pet is a 
baby rabbit,” “our baby 
is a pet.” 
a PO ee Oe cdo waeses 3 
"Te UE vcvavéees 1 4B | “who” for “how”’...... 1 
ee ees ede cas 1 PE ee heck be< se 1 
~~ ke 1 “was” for “saw”....... 1 
“read”’ for ‘‘heard”’..... 1 
“crow” for ‘‘cowardly’’. 1 
PrivaTE ScHOOL 
Cases 
5 ON Wee UNE og oc wecc ve ceccccccccccccqucees 1 
ee es cece esse bceuneaebe 1 
PETRI Sede od wctred horas cede > adcacepnand 1 
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Each of these words contains groups of consonants that almost invari- 
ably constitute difficulties for children in early stages of reading quite 
apart from any “reversal tendency” individual children may show. 
The reversal substitutes are interesting in themselves. Ordinarily 
they have some semblance to the original, and in most cases easier 
words are substituted for the original difficult words. In the case of 
“‘chagrin,’”’ an unfamiliar word to practically all the children, the 
substitutes usually contain the “in” of the original word, but “charg” 
is usually substituted for “‘chagr” possibly because of the familiarity 
and meaningfulness of ‘‘charge” to the children. The normal adult 
reader occasionally finds himself making such substitutions, as did 
one individual who in reading the proper name ‘“ Golsan”’ persistently 
called it “‘Goslan.” When children meet an unfamiliar word they 
tend strongly to substitute for it a word of the same general length 
and configuration, containing at least some of the same letters, but a 
word that is meaningful to them. The substitution of “drive” for 
“divine” in the present test is a case in point. Almost universally, 
too, there is more confusion in. the middle and end parts of words 
than in initial elements or syllables. In the course of such substitution 


it is natural that quite accidentally some of the substitutions will © 


contain reversals of elements of the correct words. 

In view of these facts, the number of true reversal errors, that is 
errors due to the child’s getting an inverted image of the word, is 
probably even smaller than the tabulation indicates. The many 
other types of errors made by the children will not be summarized here. 

The total number of reversals in relation to all errors and omissions 
is shown in the following table: 


Pusuic ScHOOLS 

















Grade Total errors Reversal errors 
2A §21 6 Grades 2A-3A read only the first 
2B 231. 2 paragraph. 
3A 614 | §& 
3B 575 11 Grades 3B-4B read two or more 
4A 568 8 paragraphs. 
4B 314 3 
PrivaTE ScHOOL 
Second grade........ 906 3 (For first and second paragraphs) 
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In the oral paragraph tests as in the word pronunciation, pupils 
make comparatively few reversal errors as compared with the total 
number of errors found and the reversals errors are of the same general 
types. The general tendency is to substitute familiar for unfamiliar 
words and to make reversal errors on words that are true words when 
spelled in reverse order. These children seldom make reversal errors 
in which they substitute for a real word a nonsense word or syllable. 
Thus we find “on” for “no” but seldom ‘‘ro” for “or” or “ti” for 
“it.” The reversal error made is ordinarily a meaningful or at least 
a logical one. Since ‘‘on” and “no,” “saw” and “was” have such 
close resemblance to each other and are commonly confused in begin- 
ning reading, it may be that when the error is pointed out to the child 
the law of association by opposites operates and a bond is formed 
between the two forms of the configuration increasing the difficulty 
of distinguishing them. In some cases the reversed word makes sense 
with the preceding words, but not with what follows and the child 
who has not developed anticipatory facility may make more errors 
than the child who has. 

These results point in general to a decline in reversals from lower 
to higher grades paralleling the decline in all errors and in the time 
required for reading; and to lack of concentration of reversals in the 
individual cases studied. 


REVERSALS IN EYE SPAN TESTS 


This material was so comparatively easy for all pupils that few 
errors of any kind were made, but slow readers were unable to attempt 
as much as others and consequently their opportunity to make reversal 
errors was reduced. Errors made in these tests were as shown in 
table on p. 11. 

The extent to which the same children made reversal errors in 
two or more of the tests is indicated in the following tabulation: In 
all the public school groups tested nine children made errors in both 
the oral reading and pronunciation tests, one child made errors in all 
three of the tests and one child made errors in both the paragraph 
reading tests and eye span. All other children of those who made 
reversals, sixty-one cases, showed the difficulty in only one type of 
test. In the private school second grade, no child made reversals 
in all three tests, two made errors in both eyespan and word pronuncia- 
tion, and one made reversal errors in both word pronunciation and 
paragraph reading. In one of the second grade groups one child made 
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reversal errors on both ‘“‘on” and “pig” in the word pronunciation 
lists but had no difficulty when he met the same words in the context 
of the oral reading paragraphs. From such results it is clear that 
there is not a high degree of consistency in the tendency of children 
to make reversals when they respond to reading material in different 
situations. 

Only one child made reversal errors in all three types of tests, but 
he made no reversal errors on the writing-perception tests to be 


Pusiic ScHOOoL 





3A | “saw” for “‘was” 3B | No reversals 
“no” for *fon”’ 
One case of reversing the order of 




















words. 
4A | “of” for ‘‘to”’ 4B | No reversals 
PRIVATE SCHOOL 
Cases 
ee: 1 a Ga a ois cakes ccc cc cee cduaweeles 1 
gk RE ne og 2 
PE a inin a onde .oesck dee catonn aie 1 











described farther on, in writing numbers in arithmetic or in writing 
his name. This child, a boy of low normal intelligence in the third 
grade was the poorest reader of the entire group and was pointed out 
as a case showing great confusion in word recognition. Actually 
he made only one reversal in individual words, but twice read parts 
of flash card sentences in reverse order: “Hill the on,” instead of 
“fon the hill”; and “home run can you said mother his,” instead 
of “‘his mother said ‘you can run home now.’” The flash card reading 
was in larger type than the other test material, requiring wider eye 
span. The child made no such reversal errors when reading the oral 
paragraphs in ordinary size type, although the paragraph material . 
was no easier than the flash cards. This child made all the logical 
types of errors that most beginning children make, but he made many 
more of them and worked much more slowly than the average third 
grader. To this child reading was a meaningless process, consisting 
of apparently nothing but word pronunciation, and of “spelling out” 
the words. 
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REVERSAL ERRORS AND MENTAL ABILITY 


The evidence of reversal errors is too slight in the present data to 
draw any definite conclusion concerning the relationship between the 
factor of mental ability and reversal tendency. Two few reversals 
were made by any of the children to class them definitely as reversal 
types and the groups were on the whole of average or better intelligence 
as measured by group and individual tests. The median IQ of all 
pupils in the public school groups was one hundred, and the median 
of pupils who made two or more reversal errors, nineteen cases in all, 
was ninety-two (two did not take the intelligence test), a difference that 
is not highly reliable because of the small number of cases involved. 
More evidence concerning the relation of the mental maturity factor 
to the tendency to make reversals is given in the summary of the 
perception-copying data to follow. 


REVERSAL ERRORS ON THE PERCEPTION COPYING TESTS 


This test offered clean-cut evidence of reversals in the child’s own 
writing or drawing. Space is lacking to include a list of all the 
reversal errors made, but the following tabulations indicate the number 
of reversals made by pupils taking the test in the three types of schools 
at three grade levels. In test items consisting of a series of units 
there was practically no complete mirror tendency. Only part of the 
item was ordinarily reversed, as for example, “‘bear” for ‘‘dear” or 
“cron” for ‘‘corn.” Of all reversals made, only one of the series 
items, at the second grade level, was completely reversed by one 
child: Nine hundred seventy-five for five hundred seventy-nine. 
The number of reversal errors made on different items of the test in 
the three schools is shown in the table on p. 13. 

All other types of errors made were of such uneven value that the 
number of such errors made gives a less clear picture of the relative 
difficulty of the test for pupils of different grade levels than the number 
of omissions. A comparison of the total number of omissions for each 


. grade and school with the total number of reversals made indicates a 


close relationship between the two. As the test becomes easier 
reversals decline, and the test becomes easier as children become more 
mature and more experienced in writing, spelling and reading. This 
change comes about not with any attention to the specific reversal 
elements themselves, but as the natural result of learning and maturity. 
This conclusion is substantiated further by comparison of results for the 
“English” private school and the public school. Pupils of the latter 
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make fewer reversals in the lower grades than the former although 
they are considerably less intelligent as a group, and have not been 
practiced in print-script in which the letter elements of the test were 
presented. The difference is to be explained on the basis of learning 




















Public school, Private school, Hebrew school, 
grade grade grade 
Item Total 
Second | Third} Fourth | Second | Third} Fourth | Second |Third| Fourth 

cAghcheds aaee soba * 10 ss af Hy ae - <s a 10 
ae edicwe us +b’ wes 13 12 5 8 5 5 1 49 
CE a ee 2 a 1 Sie 3 
ee as 1 1 2 
Pbtheecuewodede ha - 4 4 8 
EOE CPR EOL RE “a i 2 3 
ahs ebbiad éadamid 4 J roe es 1 5 
Mis iseavdockvctie 1 me: ea be Fe 2 3 
Ee eee ee af : 8 1 ot 9 
GD tisadewaes caw nk - cn 1 a a“ 1 
tines decoubods 9 3 6 4 4 .s 26 
Dt sch tu dcondans 1 1 ad 7 7 1 4 1 22 
eer 3 es 1 oe <s << o + 4 
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détinnedoedneee 13 1 2 ar < 16 
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Ee 1 6 2 1 10 
corn bread......... 8 2 1 10 6 7 1 35 
four and six........ 6 2 1 1 10 
ieneateokes 58 28 16 101 54 17 16 4 1 295 

Omissions total... .. 43 14 12 349 174 19 64 23 19 

Number of cases... . 86 62 82 34 37 36 ie) 13 11 



































and practice. The private school defers the teaching of skills longer 
than the public school and in the primary grades devotes much less 
time each day to them. The public schools contain “repeaters” 
who would benefit from overlearning, but the private schools do not. 
In the private schools the greater mental capacity is shown to count 
in the more rapid reduction of reversals as the fourth grade is reached. 

Almost any item of the perception-copying test might have been 
reversed by the pupils. Why, then, are some items more frequently 
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reversed than others? The items reversed appear to be those gen- 
erally that were either quite unfamiliar to the children and therefore 
unpracticed, or familiar items for which there is some exact reversal 
counterpart which receives an almost equal amount of practice, as 
for example ‘‘b”’ and “‘d,” and for which the child forms an opposites- 
association bond. Young primary children who have not formed 
correct visual images of such items naturally have difficulty when the 
only differentiating characteristic is the orientation of the item in 
a lateral direction. More children failed to orient N correctly 
than any other single item because of lack of practice and very close 
identity of the configuration N and @, with the diagonal in a sub- 
ordinate réle. Fortunately most of the symbols a young child has 
to learn have more distinguishing characteristics than the element 
of lateral orientation alone. 

Children are naturally more apt to make reversals when the 
material is meaningless to them as in the case of N, because of the 
difficulty of forming meaningful associations which would help to 
fix the item in memory while the card containing the item is shown. 
In fact, probably many young children get only a general Gestalt 
of this figure and pay no attention to the direction of the diagonal. 
They get an impression that the figure contains a diagonal and that 
is all. Since there are only two positions for the diagonal, young 
children have a fifty-fifty chance of drawing it correctly by accident 

e. The child may have a confused or vague image, rather than 
a clearly inverted one. We find that older or more mature children 
have less difficulty than young children, even though lacking specific 
practice on the item, because of generally more mature perceptive 
powers, with greater attention to detail, better memory for slight 
differences in form. The same sharpening of perceptual discrimina- 
tion with maturity is shown in the comparative success of older and 
younger children on the Binet Form-Similarities test of the four year 
level. | 
On the perception-copying test, the next most frequent reversal 


error occurred in word elements containing ‘‘b” or ‘‘d.’”’ These are 


also the letter elements that often cause confusion in reading. Chil- 
dren frequently interrupt their reading to inquire “Is that a b or ad?” 
The error is more frequently made out of context as in the present 
test than it would be were there contextual relationship to assist. 
There is less tendency to reverse g, e, h, r and the like, than b or d or p 
and q for the reason that the former do not make real letters when 
reversed whereas the latter do. There is no such word as “oat’’ that 
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could be confused with “cat”; whereas “big” and “dig” may easily 
be confused by the young child. These results indicate that one 
may normally count on “b” and “d”’ offering reversal difficulty to 
young children more often than other letters. Errors were frequent 
in the series JCE which was a meaningless item and therefore difficult 
to keep in mind. Children have had no practice with this particular 
series as a separate item. Errors were also common on the middle 
parts of the words contained in the test, or, ro, ea, ae, re, er, bl, 
and in the writing of numerals. In general we may expect to find 
in the writing and spelling of young children confusion in elements 
that are meaningless or unpracticed and reversal errors in the middle 
parts of words or symbol elements in series, and in the orientation 
of isolated numbers. 

These conclusions are somewhat similar to those for errors in 
the reading tests. Single elements or parts cause more confusion 
than the elements as a whole and difficulties seem often to be not 

much inherent in the children as in the nature of the material. 
The laws of perception, configuration, association, learning, explain 

any of the reversal errors made. 

No case of complete mirror writing was found in all the data 
examined. ¥ Although such cases are reported as recurring quite 
frequently 4mong the feebleminded, [Gordon'*] among typical popula- 
tions of school children, true mirror writing must appear very infre- 
quently. Beeley! found 42 true mirror writers in a school population 
of 106,356, and Gordon” found 4 mirror writers among 829 elementary 
school children when they were asked to write their names with their 
left on Gordon finds mirror writing associated with mental 
deficiency. 

The Hebrew school data are included in this study to determine 
what effect training children to read and write in two directions almost 
simultaneously might have on the tendency to make reversals. On 
the whole, there appears to be no more confusion, proportionately, 
than in the non-Hebrew private and public schools, judging from the 
results of the perception copying tests. This conclusion must neces- 
sarily be tentative on account of the small number of cases and the 
narrow scope of the test. These children were of better than average 
mental ability and have favorable conditions for learning. 


J SINISTRALITY AND REVERSAL ERRORS 


A number of specialists have indicated a possible connection 
between reversals and dominant handedness. Children who are 
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originally dominately left-handed, it is alleged, are more prone to 
make reversals than children predominately right-handed. Both in 
the present test data and in the study reported earlier there was no 
clear cut evidence that this is necessarily so. There were some 
children in the present study who were predominately left-handed 
and also made relatively more reversals than the rest of the group, but 
every such case could be matched with the case of a right-handed 
child who also makes more reversals than the average or than the 
majority of the group. There was however one case of a left-handed 
child who held his paper in such an extreme position, that all his 
writing must have appeared upside down to him and the result was a 
larger number than average of reversal errors. 

Dearborn’ says “‘First it was noted that among the poor readers 
there was an unusual proportion of left-handed and especially ambidex- 
trous individuals.”” This did not prove to be the case in the data of 
the present study either with respect to reading success or success with 
the copying material. The number of reversals made by left-handed 
children in the perception-copying material is shown in the table on 
opposite page. 

In all schools and grades right-handed children averaged 1.67 
reversal errors each; left-handed children, 1.85. 

Results were similar for reversals on the reading tests. Both left- 
and right-handed children make reversals with about equal propor- 
tionate frequency. There is no marked predominance of reversals 
among the left-handed children. 


RELATION OF REVERSALS TO READING SUCCESS 


The complete diagnostic reading tests containing other tests than 
those analyzed for reversal errors were given to all the third and fourth 
grade public school children and the second grade private school 
pupils. Of all the children in the public school groups who made more . 
than one reversal error on the three reading tests, the proportion of 
these who were also in the lowest third of the group in total diagnostic - 
reading score affords some indication of the relationship between the 
tendency to make reversals and success in reading. These conditions 
held true for five of the third grade children and for only one of the 
fourth grade children. This is half of the total number of those who 
made two or more errors in these groups. These results would indicate 
some positive relationship between the reversal tendency and poor 
reading progress in the third grade, but not in the fourth. In the 
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Pusiic ScHOOL 
Right-handed Left-handed 
Reversals No. of cases Reversals No. of cases 
Second grade. 6 1 5 1 
4 2 3 1 
3 1 1 2 
2 5 0 2 
1 15 
Third grade. 2 5 1 2 
1 16 0 3 
Fourth grade. 2 1 1 1 
0 14 0 4 
Private ScHoo.t (ENGLISH) 
Second grade. 0 1 11 1 
5 4 8 1 
4 2 7 l 
3 1 2 1 
2 4 1 3 
1 6 
Third grade. 5 1 5 1 
4 1 4 2 
3 3 1 1 
2 7 
1 7 
Fourth grade. 2 3 2 1 
1 6 1 2 
0 2 
Private Scuoot (HEBREW) 
-Second grade. 4 1 1 1 (ambidex- 
trous) 
3 1 
2 3 
1 2 
Third grade. 2 1 (No left-handed children 
1 2 in group) 
Fourth grade. 1 1 0 1 
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fourth grade, poor readers make many types of reading errors, but 
reversal errors very infrequently. The small number of cases involved 
prevents any conclusion drawn from having high reliability. 

A study of all the errors made by these pupils who are in the lowest 
third of the group in reading success and who made two or more 
reversals indicates that all other types of errors, substitutions, omis- 
sions, repetitions, are many times more common than reversal errors 
which constitute only a small part of the total picture of reading 
disability. The results for the private school second grade show the 
same situation. 

Some of the pupils in the lowest third of the group in reading 
made no reversal errors whatsoever. In fact, one of the second grade 
children, the poorest reader in the group, whose success in reading 
would rate as about first grade level, made no reversal errors, although 
she had ample opportunity to do so. On the other hand the most 
superior readers occasionally made reversal errors, explanations of 
which have already been suggested. In general one would expect to 
find some relationship between marked tendency toward reversal 
errors and reading deficiency, because the same factors—immaturity, 
lack of experience, incorrect instruction, emotional factors such as 
inattention would be expected to contribute both to the tendency 
to make reversals in reading and to read poorly. This has been 
pointed out by Monroe.'* In her study, cases of special reading 
disability showed more reversal tendency than a group of normal 
readers. Teegarden?! also reports that the child with more reversal 
tendency makes poor progress in reading, but the situation with 
respect to possible causal factors is not analyzed. In the present 
data there is some indication that inadequate reading habits, incorrect 
eye movements, too great reading demands on immature children, 
lead pupils to make reversals in reading. Since the reversal tendency 
in no case was consistent it can scarcely be that the reversal tendency 
was the cause of poor reading. The relationship is probably more 
largely associative than causal. Each of the third and fourth grade 
groups of the present study contained children who were deficient in 
reading to the extent of two or more years in terms of grade standards 
of the diagnostic tests and second grade children who were deficient 
a year. 

Children who were markedly defective in hearing and vision were 
not members of the groups tested. Eye dominance was not deter- 
mined. Previous studies of eye dominance in relation to reading 
success and reversal tendency are confusing. Eye dominance is 
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difficult to measure accurately. Factors of selection varying from 
study to study contribute to the confusion and the varied methods of 
measuring dominance used by different research workers contribute 
further to contradictory conclusions from different reports. The 


study of Mintz" indicates more mixed dominance in subnormal than 
in normal children. 


SUMMARY AND CONCLUSIONS 


The study of reading and writing reversals of public and private 
elementary school children under standard test conditions indicates 
a decline in frequency of the tendency in higher as contrasted with 
lower grades. None of the children examined showed a high degree 
of consistency in the tendency to make reversals. The number of 
reversals made in contrast to all other types of errors was infinitesimal. 
The reversals made were scattered over an appreciable percentage 
of all children examined. Some words and symbol elements are more 
subject to reversals than others. Items most frequently reversed 
proved to be either those on which children ordinarily receive little 
practice or with which they have little experience, or items that are 
more subtly indistinguishable than others. Some of the reversals 
may appear to be logical substitutions for the correct text of the 


material read. Laws of association and configuration explain many | 


reversal errors. The number of reversals made by children declines 


from grade to grade with no attention to reversal elements as such. ~ 


There is some indication of positive correlation between mental ability 
and reversal tendency, but the brighter private school pupils at 
second and third grade. level made more reversals than public school 
pupils, due to less practice in reading and writing. No cases of pure 


mirror writing were found in the pupils studied. Children who receive 3 
reading and writing instruction involving opposite systems of orienta- 


tion, English and Hebrew, show no more reversal tendency than those 
who learn only one system. Left-handed children tend in the present 


study to make on the average slightly more reversal errors than right- . 


handed children but the difference is so small as to be insignificant 
statistically. We may conclude that left-handed and right-handed 


children tend to make practically the same number of reversal errors. 
There is some tendency for the poorest readers to make more reversals » 


than good readers, just as the poorest readers made more kinds of 
all other types of errors than good readers. The relationship would 
appear to be associative. The inconsistency of the reversal tendency 
prevents a conclusion that reversal tendency is a cause of poor reading. 
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THE INFLUENCE OF THE FORM OF ITEM ON THE 
VALIDITY OF ACHIEVEMENT TESTS! 


WALTER H. MAGILL 


University of Pennsylvania 


Presumably every qualified person, no matter how contentious, will 
grant that the degree of validity possessed by any measuring instru- 
ment in any field of measurement is a matter of paramount importance; 
probably most qualified persons in the field of educational measure- 
ment will also readily grant that the validation of published achieve- 
ment tests leaves much to be desired, particularly if the validation is 
with respect to the life values of instruction. So much granted, any 
factor influencing the validity of achievement tests assumes a high 
degree of importance. This paper will, as indicated by the title, con- 
cern itself with one of these factors, the form of test item, and will 
present some evidence regarding its significance. 

There seems to be a widespread assumption on the part of achieve- 
ment test constructors and authorities that recall, multiple response 
and true-false forms of item are sufficiently equivalent in validity to 
justify indiscriminate use from-the standpoint of validity. This is 
indicated indirectly by the manner in which the forms are used in 
published tests and explicitly by statements in standard books in test 
construction. For instance, Ruch? says: ‘“‘When validity coefficients 
are corrected for attenuation, the resulting values are high, showing 
that true-false, multiple choice and recall tests measure roughly the 
same abilities,’ and his recommendations concerning the selection of 
item forms for use in a test include many other considerations but not 
that of effect on validity. O’Dell® says: “It is very probable that for 
particular bodies of subject-matter and for special purposes certain 
forms of exercises yield more valid results than do others,” thus seem- 
ing to indicate a contrary point of view, but he goes on to say: “In 
general it appears that at least all the more commonly used forms of the 
new examination differ so little in regard to validity that it need not be 





1 Presented before Section Q, The American Association ivr the Advancement 
of Science, at Atlantic City, December 28th, 1932. 

2 Ruch, G. M.: ‘The Objective or New Type Examination.” Scott, Foresman 
& Co., N. Y., 1929, p. 290. 

$O’Dell, C. W.: “Traditional Examinations and New Type Tests.” Century 
Co., N. Y., 1928, p. 249. 
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considered as a factor in selecting the type to be used.” Tiegs! says: 
‘“‘In general, so far as measurement techniques permit us to determine, 
true-false, multiple choice and completion tests measure approxi- 
mately the same thing,” and in another place: “Evidence available 
indicates that the three most used types of new type tests are approxi- 
mately equal in validity.” This assumption seems to be based on 
two types of evidence, one the similarity in size of coefficients of 
correlation between alternate test forms made up of test items of the 
various types and certain criteria of validity; the other, high intercor- 
relations between alternate test forms. The writer has been skeptical 
of the assumption on three grounds; first, recall, multiple response and 
true-false items apparently require greatly dissimilar types and degrees 
of recall; second, the criteria employed in the studies of comparative 
validities (cf. Ruch, op. cit. pp. 281-290) have been various combina- 
tions of essay type examinations, objective type examinations, instruc- 
tors’ estimates, pupils’ estimates and term grades, all academic and 
questionable substitutes for the life values which supposedly form the 
objectives of present-day education; third, coefficients of correlation 
are heavily smoothed averages and the graph of relationship between 
the Pearson r and its error of estimate as r varies from zero to one is 
not rectilinear but the quadrant of a circle,? therefore numerically high 
coefficients of correlation may be accompanied by considerable vari- 
ability in the contributing data, and, when used as the sole basis for 
inference, are-likely to lead to erroneous conclusions. The writer has 
been impelled, therefore, to go back of the correlations of gross scores 
to the contributing data in a search for further evidence regarding the 
equivalence in validity of the three forms. A description of the 
investigation follows. 

Three forms of a miscellaneous information test of fifty items 
(Toops*) were given to two of the writer’s classes, made up chiefly of 
teachers-in service, the first forty-four and the second fifty-four in 
number. The first form of the test contains the fifty items in one word 
answer form, the second contains the same items in five response form 
and the third the same items in true-false form. The three forms were 
given in the order named, one immediately following the other, and so 





1 Tiegs, E. W.: ‘‘Tests and Measurements for Teachers.’”’ Houghton, Mifflin 
& Co., N. Y., 1931, pp. 251, 252. 

* Cf. Kelley, T. L.: “Statistical Method.”” Macmillan Co., N. Y., 1924, p. 174. 

3 Cf. Toops, H. A.: ‘‘ Trade Tests in Education.” Teachers College Contribu- 
tions to Education, No. 115, pp. 39-47. 
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supervised that there was no opportunity for the subjects to learn the 
answers during the test period (other than through the incidental 
practice effect of the tests themselves). The tests were given to class 
one as speed tests, with limits of seven, five and three minutes, respec- 
tively, which permitted only a few of the most rapid to finish. All 
members of class two were given sufficient time to finish each test and 
each, as he finished, noted the time of finishing. The distribution of 
times consumed follows. 





Minimum | Q; | Median | Q; | Maximum 





eds a ehh i's ved bees enehewn 4:00 6:50) 8:05 {9:00} 14:00 
cane nied e dab eguen 1:20 {|3:40) 4:30 (6:00) 9:00 
ons cnwin ened ob Mea 1:30 2:20; 2:40 {3:00 5:00 




















Intercorrelations between gross scores were calculated and the 
responses of each subject to each item of the three forms were compared 
to determine the number of inconsistencies of response, 7.e., responses 
correct on one form and incorrect on another. Since two forms of a 
test item can scarcely be considered to be equivalent in validity when 
one form is answered correctly and the other incorrectly, the relative 
frequency of inconsistencies affords important evidence regarding the 
general equivalence in validity of the forms. The performances of 
three individuals on the three tests making, respectively, high, moder- 
ate and low recall scores are given below in graphic form as samples of 
the findings. Each dot represents a correct response on the corre- 
spondingly numbered test item. Gross scores and inconsistencies in 
per cents of the total numbers of pairs of items answered are given for 
easy comparison on the right. Below the samples is a complete 
tabulation of the intercorrelations and of the inconsistencies in per 
cents. 

It will be noted that the intercorrelations are in general high, that 
they are of the order of the size of those reported by Ruch (op. cit.) 
upon which assumptions of equivalent validities are based, but that 
they are accompanied by inconsistencies in response which are not 
only high in average but are widely variable in size and, as indicated 
in the samples of complete scores, are not all in one direction. In cer- 
tain functions of achievement tests, such as diagnosis and the deter- 
mination of the specific gains from instruction, the responses to the 
specific items of a test are more important than the gross scores; for 
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such functions the degrees of, and variations in, inconsistency seem 
entirely too large to justify assumptions of equivalence in validity. 
For certain other functions, however, such as grading, classification 
and selection, the comparison of gross scores is involved and the 
responses to specific items are unimportant; for such functions it is 
possible that a high degree of inconsistency is not significant. If the 
inconsistencies were largely due to the opportunities for guessing pro- 
vided by the multiple response and true-false forms the effect might 
be largely removed from the gross scores by the conventional correc- 
tions for chance errors. If because of constant differences in difficulty 
or other reason the inconsistencies bore a constant ratio to the con- 
sistencies the gross scores might still be valid indirect measures of 
relative achievement. 

To secure evidence regarding the influence of corrections for chance 
upon intercorrelations, the intercorrelations obtained with uncor- 
rected scores were compared with those obtained with the scores of 
the multiple response and true-false tests corrected by the formula 


R—W 
sy" If the inconsistencies were appreciably due to guessing 





the intercorrelations should be proportionately raised by the 
corrections for chance. The correlations obtained follow. 











Correlation between Class 1 Class 2 
RECALL—true-false, uncorrected..................... .61 + .066|).76 + .043 
MO arc aden duces esicccedesecesour .52 + .076|.84 + .027 
Recall—five response, uncorrected..................... .88 + .020).91 + .015 
Recall—five response, R — 44W............... cece eee .85 + .014).90 + .017 
Five response—true-false, uncorrected.................. .60 + .064).91 + .015 
Five response, R — 4W—true-false, R — W........... .72 + .049).85 + .026 








It will be noted that four of the six coefficients are reduced in size 
and two are increased and that each of the increases is paralleled by a 
corresponding reduction in the other class. No evidence is forth- 
coming, therefore, that the effect of the inconsistencies can be reduced 
by corrections for guessing. 

If the inconsistencies bear a constant ratio to the consistencies, the 
correlations between inconsistencies and gross scores should be high. 
It seems reasonable, also, to suppose that they may be negative, 
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because inconsistencies in response would probably be more frequent 
with the partially learned elements of knowledge represented by | 
moderate and low scores than with the more perfect knowledge repre- 
sented by high scores. The correlations between inconsistencies and 
gross scores follow. 








Correlation between r, class 1 r, class 2 
Recall scores—R — M inconsistencies............ —.17 + .096| —.42 + .074 
Recall scores—R — T inconsistencies............. — .60 + .065| —.66 + .056 
Multiple response scores—M — T' inconsistencies. .| —.32 + .094| —.64 + .053 











It will be seen that these coefficients are consistently negative, 
that none are high and that they vary widely in size. They do not 
support, therefore, assumptions of high equivalence in validity in the 
three tests. 

It must be recognized that the results of this investigation are by 
no means conclusive, the numbers of cases are small and the results 
may not be representative of those which would be obtained by changes 
in type of individual tested, items of mental attainment or other fac- 
tors. The investigation does demonstrate possibilities, however,— 
that high intercorrelations may be associated with high percentages of 
inconsistency in the responses, that these percentages may be widely 
variable in size and in distribution and that corrections for chance do 
not necessarily eliminate the effects of the inconsistency from gross 
scores. Such possibilities seem to be significant enough to throw upon 
anyone who uses test item forms indiscriminately the burden of proof 
of the validity of such use. 

As an alternative to the indiscriminate use of test item forms the 
writer offers the following considerations. It seems axiomatic thata 
high degree of validity in any process of measurement is most readily 
and most surely obtained, when circumstances make this feasible, by 
direct measurement, 7.e., the employment of a measuring instrument 
which possesses in outstanding degree the characteristic to be measured 
and the direct comparison of this characteristic in object and instru- 
ment; e.g., the use of a yardstick to measure the length of a room, the 
use of a gallon measure to determine the capacity of an automobile 
cooling system. When indirect measurement is employed, as in the 
employment of the expansion of the liquid in a thermometer to meas- 
ure increases in temperature, validity in the measurement is con- 
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tingent upon clearly established, definite relationships between the 
characteristic to be measured and the characteristic employed in 
measurement. 

Applying the principle of direct measurement to achievement 
tests, validity with respect to life values requires that test items shall 
exercise the same mental functions in the same manner as do the life 
situations for which the instruction under measurement is designed to 
fit the learner. Direct measurement demands, for instance, that 
engineering students who are being given an examination to test their 
ability to solve problems met by the practicing engineer shall be per- 
mitted to use an engineering handbook, rather than be required to 
recall the formulas needed, since practicing engineers rely upon hand- 
books; that home economics « ‘udents shall be tested for their knowl- 
edge of the proper procedure for making cocoa by au item in keeping 
with the complexity of the procedure and not with the three response 
item frequently found in published tests; that the ability to recall 
names, uses, items of procedures, etc., be tested by simple recall types 
of item; that multiple response types be restricted to the measurement 
of ability to choose appropriately from a corresponding number of 
alternatives; that true-false items be restricted to the testing of the 
awareness of the truth or falsity of a statement. 

It is quite possible that an apparent similarity between the mental 
functions exercised by a test item and those demanded by an educa- 
tional objective may at times be superficial or illusory, but since an 
adequate degree of validity is of paramount importance in measure- 
ment and since it is extremely difficult to determine the degree that is 
present in a test, it would seem that test constructors are on much 
safer ground when they use every effort to secure as test items direct 
measures of items of mental attainment than when they use test item 
forms indiscriminately, as indirect measures, under assumptions of 
equivalence in validity. 


SUMMARY 


There is a widespread assumption that recall, multiple response and 
true-false forms of achievement test item are sufficiently equivalent in 
validity to justify indiscriminate use from this standpoint. This 
assumption apparently is based in large measure on the high intercor- 
relations between test forms which have been obtained. 

The investigation herein described, while not conclusive, demon- 
strates the following as possibilities: 
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(a) High intercorrelations may be accompanied by high percent- 
ages of inconsistency in the responses to specific items. 

(b) The percentage of inconsistency is widely variable in size and 
also varies inversely with the gross scores, so that it cannot be consid- 
ered to be due to the influence of constant factors, which might be 
eliminated by statistical treatment of the scores. 

(ec) The influence of the inconsistency in response upon the gross 
Sco: °3 is not consistently reduced by correcting the scores for chance. 

All of these possibilities are inconsistent with equivalence in validity 
and place upon anyone who employs test item forms indiscriminately 
the obligation of proving the validity of such use under the attending 
circumstances. 

Therefore test constructors are on safer ground when they strive 
to so select and use test item forms that they represent direct measures 
of the items of mental attainment under measurement than when 
they use the forms indiscriminately under assumptions of equivalence 
in validity. 
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THE RELATION BETWEEN INSTRUCTIONS AND PAST 
EXPERIENCE IN A SIMPLE OBSERVATIONAL TASK 


HARRY HELSON 


Bryn Mawr College 


The effects of instructions, Aufgabe, set, and attitude upon speed 
and type of response have been known since the early reaction-time 
and attention experiments, the work of the Wiirzburg school, and 
especially more recent studies, notably by Starch, Peterson, and 
Dashiell. The part played by these factors has usually been attributed 
to their connection with or effect upon “‘ past experience” and “ associ- 
ation.” Thus Starch, in introducing his experiments on completing 
skeleton words under classified and miscellanecus headings says: ‘‘ The 
particular meaning given to a group of sensations is determined not 
only by the general mass of previous experiences but also by the 
particular system of past associations dominant in the mind at 
the time, that is, meaning is determined by the present setting of the 
mind.”! Peterson takes essentially the same position in discussing 
the fact that an active attitude, the will to learn, is more effective in 
immediate and delayed recall than a passive attitude, when he says 
that the active attitude has the advantage because it affords ‘‘easy 
and numerous associations,” although he admits that there are other, 
at present unknown factors involved.? Dashiell strongly emphasizes 
the importance of set when he says, recounting his experiments in 
which subjects took less time to complete lists of similar arithmetic 
problems than mixed, that ‘‘The behavior toward the stimuli was as 
much a function or result of the attitude set up by the instructions 
as it was of the stimuli themselves and of previously built-up habits of 
response thereto.’”* But his later discussion of this topic indicates 
that he regards the function of the instructions to lie merely in their 
determination of which particular old response will be revived out of 
the many possible ones from past experience.* 





1 Starch, D.: ‘‘Experiments in Educational Psychology.”” New York, 1924, 
p. 146. 

* Peterson, J.: The effect of attitude on immediate and delayed reproduction, 
etc. J. Ed. Psychol., Vol. VII, 1916, 523ff. 

* Dashiell, J. F.: “Fundamentals of Objective Psychology.” New York, 
1928, p. 282. 

* Tbid., 370ff. 
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The fact is often overlooked that when the instructions evoke a 
response which has not been made in the past or one which would 
never have been expected on the basis of the subject’s past behavior, 
the theory which allows such factors as set or instructions merely to 
select or act in accordance with past experience and associations is 
untenable.| Psychologists as well as laymen are prone to emphasize 
those cases in which past experiences help us in meeting new situations 
and to neglect the equally numerous cases in which past experiences 
prevent us from making adequate adjustments to novel demands. } 
James recognized the fact that past experience cannot be relied upon 
to furnish us with ready-made useful adjustments when he wrote: 
“Hardly ever can a youth transferred to the society of his betters 
unlearn the nasality and other vices of speech bred in him by the 
associations of his growing years. Hardly ever, indeed, no matter 
how much money there be in his pocket, can he ever learn to dress 
like a gentleman-born . . . An invisible law, as strong as gravitation, 
keeps him within his orbit, arrayed this year as he was the last; and 
how his better-bred acquaintances contrive to get the things they 
wear will be for him a mystery till his dying day.”! Experimental 
studies bear out James’ illustration from every-day life but they are 
usually overlooked, so plausible does the past experience theory seem, 
so well does it fit with much current psychological thinking. 

It is difficult, if not impossible, to devise an experimentum crucis 
for the past experience theory because of the many different senses 
in which it may be and has been used by various psychologists. As I 
have pointed out elsewhere,j past experience contains so many possi- 
bilities of response that it furnishes an explanation for all cases, after 
they have occurred Prediction of crucial cases is not so easy. |In 
this study the attempt is made to test anew the past experience and 
association theory.) We wish to determine_if past experiences and 
associations will havé more effect upon a simple performance than 


other factors, chiefly the set induced by changed instructions. It 


might be argued that past experiences are operative in any case, 
whether the instructions are changed or not, but a consideration of 
our procedure will show that we can, to some extent, separate the past 
experience and associative factors, on the one hand, from the effect 
of the instructions on the other hand. 





1 James, W.: ‘‘Principles of Psychology.”’ New York, Vol. I, 1890, p. 122. 
? Helson, H.: Studies in the theory of perception: 1. The clearness-context 
theory. Psychol. Rev., Vol. XX XIX, 1932, 59ff. 
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We can decide if past experiences and associations determine a 
simple reaction more than the instructions under which the Os work 
by the following procedure: We present playing cards to three groups 
of subjects, some “normal” and others altered in various ways. The 
latter will be called the ‘‘test’’ cards hereafter. All three groups 
are given one hundred forty-four exposures of the normal cards with 
the first set of instructions. The first and second groups are then 
given a series in which normal and test cards are mixed, but still 
with the first instructions. The first group is next given a mixed 
series with a second set of instructions. The third group is given the 
mixed series with the second instructions, never having had a mized 
series with the first instructions. The third group has therefore had 
less practise than the first group with both normal and test cards. 
If we leave aside the question of the changed instructions, we should 
expect the first group to be better in detecting test cards with the 
second instructions than the third because of its greater amount of 
practise, greater number of associations with the materials used, and a 
richer past experience. If Group 3 equals or excels Group 1 with the 
second instructions, then the past experiences and practise of Group 1 
do not count for very much beside the instructions. We can thus 
determine the relative importance of these factors in so far as they can 
be varied independently of each other. After a more detailed descrip- 
tion of our procedure we shall discuss the results. 


EXPERIMENTAL PROCEDURE 


The playing cards were exposed on a ground glass screen by means 
of a Bausch and Lomb projection lantern fitted with a special shutter 
by which exposure times as short as %o sec. could be obtained. 
The images of the cards on the screen were 23 X 35 cm. in the majority 
of the experiments and the Os sat at a distance of 171 cm. from the 
screen, heads supported by Stoelting head rests. The Os fixated the 
center of the screen at a “‘Ready” . . . ‘‘Now” signal and the cards 
were flashed on the screen with an exposure time of 19 sec. 

The cards consisted of two sets: The first set were normal playing 
cards just as they came from the store; the second set, consisting of 
thirty-six cards taken from a normal pack, were altered in some way 
and hence will be called the ‘“‘test’”’ set. The alteration on a test 
card might be as slight as the substitution of the figure 8 for a six in 
the upper right-hand corner of the card, or the formation of a new 
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card by pasting together the two halves of different cards.! The 
observers, divided into three groups, were all shown thirty-six normal 
cards in one hundred forty-four exposures under the first set of instruc- 
tions. This gave all Os the same amount of practise previous to the 
exposure of test cards in random order among the normal cards. The 
errors made in reporting normal cards include those made with the one 
hundred forty-four exposures in the practise series as well as those 
made in the test series when both normal and test cards were given. 

The thirteen Os acting as subjects were all students at Bryn Mawr 
and more or less addicted to playing bridge, hence familiar with 
playing cards. Five Os acted as subjects in Group 1, four in Group 
2, and four in Group 3. The experiments with Groups 2 and 3 were 
conducted by another experimenter? about one year after the experi- 





1 The test cards were made by changing normal cards as follows: Two clubs 
with spade spot for bottom club; three hearts with diamond spot at top; half two 
diamonds, half two hearts; ace hearts with diamond under top A; ace spades with 
club under top A; eight diamonds with heart top center; six diamonds with ten 
hearts lower right corner; top four diamonds, bottom ace diamond, three diamond 
spots in triangle; seven diamonds with heart under top seven; two spades with club 
spot at bottom; eight spades with ten clubs in lower right corner; six hearts with 
eight in upper corner; three diamonds with heart spot in center; seven clubs with six 
in upper corner; six spades with lower left spot upside down; nine clubs with 
spade spot at right; nine spades with club spot in upper right; three spades with 
bottom spot club and club under bottom three; five hearts with right top heart 
turned on side; five clubs top, bottom three clubs; eight clubs with spade in the 
middle top spot; four hearts with diamond on side at left top; half king hearts, 
half jack hearts; five diamonds, middle spot turned on side; nine hearts with top 
left number nine diamonds; half queen diamonds, half jack diamonds; seven spades 
with club spot in middle of left row; five spades with center spot on side; seven 
hearts with diamond spot in middle left row; jack clubs with king face at top; 
ten clubs with ten spade in upper corner; four clubs with four spade in upper left 
corner; king spades with club spot at left top; queen hearts with diamond spot at 
right top; half queen spades, half queen clubs; king diamonds with queen head on 
top half. 

It should be noticed that no color changes were made with the test cards as 
we wished to keep the subjects in ignorance as long as possible of the conditions 
of experimentation. That equally successful results can be obtained with color 
changes has been shown in two studies by C. A. Dickinson, A. J. Psychol., Vol. 
XXXVII, 1926, p. 342, and Vol. XX XVIII, 1927, pp. 266-279. 

2I wish to make acknowledgments to the Misses A. V. Grant and E. M. 
Chalfant who conducted the experiments. Dr. L. M. Crabbs, who has read the 
manuscript of this study, has raised the question of the abilities of our three 
groups. While no special precautions were used to obtain matched groups, and 
the numbers are small, I do not believe that differences in ability played much 
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ments with Group 1 had been performed and our results are therefore 
free from any experimental error in this direction, 7.e., they cannot be 
said to be due to laboratory atmosphere, suggestion or any other 
personal factor in the experiments. It will be obvious that only by 
keeping the subjects in complete ignorance of the conditions and 
purpose of the experiments can our findings be duplicated. Since we 
used two experimenters with three different groups of Os we feel that 
the results cannot be attributed to laboratory atmosphere or any 
factors extraneous to the conditions of the experiments. 

The two sets of instructions were as follows: (1) “‘You will be 
shown a number of playing cards and asked to report upon them. 
Please report on the color, number of spots, identify the numerals, and 
give as many details or as complete descriptions as you can. Fixate 
the center of the screen. You will be given a ready-now signal before 
the card is exposed.”’ (2) “You will be shown a number of playing 
cards. Notice them carefully. Some of them have been changed 
and some are normal playing cards. Please be careful not to miss any 
of the changes. Report them to me.” 

After the practise series of one hundred forty-four exposures of nor- 
mal cards with the first set of instructions, Group 1 was presented with 
a test series consisting of two hundred normal and fifty test cards scat- 
tered among them, with the first set of instructions. Atthecompletion 
of this series they were then given the second set of instructions and a 
series given consisting of one hundred normal and fifty test cards. 
Group 2, after the practise series with the first instructions, was given 
the two hundred normal and fifty test cards still with the first set of 
instructions.: Group 3, after the practise series and the first instruc- 
tions, were given one hundred normal cards and fifty test cards but 
now with the second instructions. We can therefore compare a 
group having both sets of instructions with a group having only the 
first or the second set of instructions in the test series, as well as with 
itself on both test series under different instructions. Any change 
reported by the Os was counted as a detection, whether in the practise 
series or test series, whether seen on normal cards, and hence incorrect, 
or on test cards. 

The results for the different groups with the normal and test 
cards under the two sets of instructions are given in Table I. 





part in determinig the results. The results are so clear-cut and fit in so well 
with the work of previous investigators that our method of choosing subjects, 
by chance, seems to have justified itself. 
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TABLE I.—PERCENTAGES OF TEST AND NorMAL Carps REPORTED AS CHANGED 
UNDER Eacu Sgt or INSTRUCTIONS BY THE Os IN Groups 1, 2, AND 3 











First instructions Second instructions 
O 
Test cards |Normal cards} Test cards |Normal cards 
ER C 32 0 60 6.7 
G 0 0 52 4.2 
H 0 3.3 50 3.4 
Q 20 0 58 3.4 
Y 4 0 46 0 
Av. 11.2 0.66 53.2 2.92 
se D 4 1.6 
N 12 1.3 
S 22 4.9 
Sn 52 4.9 
Av. 22.5 3.15 
ee Ba er ocints 46 19.6 
Bl sehen ar 66 3.6 
Br SI Setaca 58 2.4 
F 76 3.2 
Av. — — 61.5 72 




















DISCUSSION OF RESULTS 


The results in Table I practically speak for themselves. The Os 
in Group 1 do nearly five times as well under the second set of instruc- 
tions as under the first, while the Os in Group 3 stand nearer to the 
Os in Group 1 in their better performance than in their performance 
under the first set of instructions. Group 2 stand nearer Group 1 in 
their results with the first instructions. Even the Os in Group 1 
who made a large number of detections under the first set of instruc- 
tions practically doubled the aumber of detections under the second 
set of instructions (C and Q). The effect of instructions seems to be 
greatest upon the poorer Os whose improvement is most remarkable, 
as a comparison of the percentages of G, H, and Y under the two sets 
of instructions shows. The differences between Os both within Group 
1 and among groups would seem to lie in the fact that some Os either 
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‘caught on” during the first test series and under the first set of 
instructions or were better observers. The former explanation seems 
the correct one since the poorer Os approach so nearly to the records 
made by the best in the second instructions. In any case, it appears 
that externally imposed instructions are more effective than self- 
administered instructions, if we regard the ‘‘ catching-on”’ as equivalent 
to self instructions. We predict that even an O like Sn whose per- 
centage in the first instructions equals the average percentage of 
the Os in Group 1 in the second instructions would improve her 
performance if given the second instructions because of the greater 
efficacy of the “‘other-administered” instructions as against the self- 
administered type. 

The fact that Group 3 excels Group 1, in spite of the fact that 
Group 1 had had two hundred fifty more exposures, shows again the 
importance of instructions. It appears that practise may be of less 
importance than certain factors entering into the actual conditions 
of the performance, for Group 3 does better than Group 1 which had 
more practise. Whether or not the difference between these two 
groups is statistically significant is beside the point for if Group 3 
had merely equalled Group 1 we should have to regard this as sig- 
nificant because of the fact that Group 3 had only a little more than 
half as much practise with the material as Group 1. In other words, 
if Group 3 had only equalled Group 1, we should have to grant that 
the second instructions make for better performance with less practise. 
The superiority of Group 3 over Group 2, and Group 1 with the second 
instructions, over Group 2, is further confirmation of this point. We 
are left with the changed instructions as the most effective factor and 
the only reasonable explanation of these results. 

It would appear that the most perfect types of response come into 
being through other factors than practise and the operation of old 
associations. Past experiences may be of very little significance 
beside other factors which determine what and how past experiences 
shall become effective. Past experiences are useless unless they are 
made effective by the conditions of action; they may be a hindrance 
under many circumstances. Nor do the concepts of past experience 
and association help us to explain behavior when used alone because 
we cannot be certain how they will function apart from the actual 
conditions of behavior. The direction, manner and type of organiza- 
tion must be taken into account as well as the materials used. Hence 
the importance of such terms as Aufgabe, set, attitude, and interest. 




















36 The Journal of Educational Psychology 


Two considerations remain to be discussed, one regarding the 
results, the other our interpretation of them. It will be seen from 
Table I that some normal cards were reported incorrectly by almost 
all Os with both sets of instructions, but more errors were made with 
the second than with the first instructions. It may therefore be urged 
that the second instructions are responsible for more errors as well as 
more correct responses. This is true and we shall not deny the 
cogency of the argument. But the number of detections is so much 
greater than the increase in the number of errors with the second 
instructions that the gain must be held to compensate for the loss. 
We have to decide in each case what we wish to get and what we can 
afford to sacrifice. Often some diminution in accuracy may be 
tolerated for a gain in other aspects of a performance. On the other 
hand, further investigation may prove that it is possible to frame the 
instructions in such a way as to produce the maximal number of 
detections with a minimum number of errors. 

The second consideration rests upon the fact that every exper- 
imental finding is a function of the conditions of experimentation and 
so it might be objected that the superiority with the second instruc- 
tions depended upon certain factors in the experiments, e.g., shortness 
of the exposure-time, nature of the materials used, etc. While allow- 
ing full force to this argument, our interpretation nevertheless possesses 
wide applicability. Every response is made under conditions favoring 
some factors and nullifying others. In many life situations, as well 
as laboratory situations, are many conditions which minimize the 
part played by past experiences and associations in determining 
present responses. Past experiences and associations must be made 
effective by factors inhering within the situation confronting the 
organism or else they may fail to function or may be a positive hin- 
drance. The introspections of some of the Os bears us out in this 
point of view: Their great familiarity with playing cards, so they 
said, made them feel very certain that they could not be wrong 
when they reported a card as belonging to a given suit, number and 
“‘normal.’”’ How many times have we not been handicapped by our 
past experiences because of our failure to see clearly and freshly what 
was novel in a situation and demanded new adjustments never before 
made. Our experimental conditions were therefore not unlike many 
life situations. 

Finally, we must point out that the adequacy and efficiency of 
responses cannot be explained wholly in terms of the amount of 





il 


er. eS ae Peo 


> O 


at 
yre 


hy 


of 
of 





Relation between Instructions and Past Experience 37 


previous practise, richness of associations, and the like, since Group 3 


did as well as Group 1 with far less practise and fewer associations. A 


study of the two sets of instructions reveals that the second instruc- 
tions were more effective in that they required the Os to discriminate 
between normal and “changed” cards whereas the firstinstructions 
called for reports on details which may or may not have been appre- 
hended with those instructions. The effect of the instructions, as 
Lewin! has pointed out, is to release certain types of activity, some of 
which are more adequate than others for the problem at hand. We 
have here something more than a mere running off of old habits or 
recall of old associations. The Os are set to do different things by 
the two sets of instructions, hence the difference in performance with 
them. How past experiences and associations will function depends, 
therefore, upon the type and direction of the activities involved. It 
is to some principle over and above that of association we must look 
if we are to understand how the organism works. The set induced 
by the instructions and the type of activity called forth by them 
furnish some clue as to what an organism will and will not utilize from 
its past experiences and associations in the solution of new problems. 


CONCLUSIONS 


The unqualified and uncritical assumption that past experience 
and association account for new adjustments has been examined in 
the light of laboratory and common-sense observations. It has been 
found that the instructions, calling for now one kind of activity, now 
another, determine the type and adequacy of response more than 
amount of practise, richness of associations and past experiences with 
the type of material used. This is explained by the fact that the 
instructions determine what resources of the organism will be brought 
into play in solving its problems through their effect upon set and type 
of activity used. The emptiness of the concepts of past experience 
and association is pointed out since by themselves they do not tell 
us what experiences and associations will or will not be utilized. Some 
principle is needed to denote the type of organization and activity 
involved in any given response before we can predict how the organism 
will behave in the face of a problem. One such principle, investigated 





1Lewin, K.: Das Problem der Willensmessung und das Grundgesetz der 
Assoziation, Psychol. Forsch., Vol. 1, 1922, pp. 191-302; ¢bid., Vol. II, pp. 65-141. 
For a résumé of Lewin’s work see the writer’s The Psychology of Gestalt. Am. J. 
Psychol., Vol. XX XVII, 1926, pp. 50ff. 
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in this study, is contained in the concept of Aufgabe, set or instructions 
which determine the kind of activity and hence the utilization of past 
experiences and associations. If past experiences and associations 
are to be effective in determining present responses then they must be 
organized, directed and set into action by the operation of some factor 
contained within the stimulating situation. 
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VALIDITY OF THE WOODWORTH-MATHEWS 
PERSONAL DATA SHEET FOR DIAGNOSING CERTAIN 
PERSONALITY DISORDERS 
J. WAYNE WRIGHTSTONE 
Institute of School Experimentation, Teachers College, Columbia University 


Is the Woodworth-Mathews Personal Data Sheet a valid instru- 
ment for diagnosing specific traits of personality in individuals when 
it is administered as a group test to typical classes, fourth grade and 
older? That was the question that faced the present writer in a 
comprehensive study of pupil adjustment, upon which he is now 
engaged. Most of the pupils considered in this study were rated 
more or less normal in personal and social adjustment according 
to criteria of cursory teacher observation, opinion, and estimate; 
that is, most of them did not deviate conspicuously from average 
behavior and conduct. Now, the Woodworth-Mathews Personal 
Data Sheet comprises seventy-five items, or questions, which attempt 
to secure data from each pupil. These data may be classified roughly 
under the following heads: Social adjustment, fears of things, attitude 
toward home, troubled dreams, nervous habits, eating habits, personal 
conflicts, or maladjustment, daydreaming, physical health, sadistic 
tendencies, persecution complex, phobias, and manias. One of the 
questions, for example, is: “Do you like to play with other children?, 
Yes, No.” 

It will be observed that, if the Personal Data Sheet is to be a valid 
instrument, the validity will depend upon two factors. First, the 
items comprising the test must be inclusive enough to make an 
inventory of the most significant aspects of personality and conduct, 
Second, the pupils answering of the items must be frank, sincere, and 
truthful. For some of the behavior traits, such as nervous habits, 
some social adjustments, daydreaming, certain personality conflicts, 
and the like, a conscientious and systematic observer may secure 
evidence of personality status by overt manifestations on the part 
of the individual. On the other hand, fears, phobias, dreams, feelings 
connected with physical health, and the like are often difficult for 
a teacher or observer in a school to discover by either direct or indirect 
overt activities. 

However, to the extent that it was possible, an extensive survey 
of personality traits was made from approximately one hundred pupils 
in grades five to nine, inclusive, of Summit, New Jersey, schools. 
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Administrators, supervisors, teachers, but particularly the psychiatric 
social worker—called the Visiting Teacher—cooperated in securing 
an index of the validity of the Personal Data Sheet. The test was 
administered to eighteen fifth grade pupils, fifty sixth grade pupils, 
and twenty-four eighth and ninth grade pupils. The Visiting Teacher 
had done case work with thirteen of the eighth and ninth grade pupils 
taking the test. 


A PROPOSED TECHNIQUE FOR ESTIMATING THE VALIDITY OF THE 
PERSONAL DATA SHEET AS A DIAGNOSTIC INSTRUMENT 


Briefly stated, the proposed technique for estimating the validity 
of the Personal Data Sheet was: Two independent diagnoses were 
to be made, one by the teachers, supervisors, and Visiting Teacher 
who knew the pupils; the other diagnosis was to be made by the 
research worker from responses on the Personal Data Sheet. Then 
independent diagnoses of each pupil were compared for agreement 
of descriptions. In order to make the descriptions as comparable 
as possible, the following headings were used for the descriptions: 
Social adjustment with other individuals, home conditions, nervous 
habits, daydreaming, physical symptoms, fears of things, persecution 
complex, stealing, any symptoms of phobias or manias. These were 
chosen among the others because overt manifestations were more 
easily observed. 

The pupils in this study were chosen from three schools. School A, 
an elementary school of six grades, was located in a part of the com- 
munity where most parents engaged in commercial or professional 
vocations. School B, another elementary school of six grades, was 
located in a part of the community comprised largely of foreign born 
parents, who are for the most part small shopkeepers, skilled and 
unskilled laborers. School C, a junior high school, was very much like 
School A. The following are the scores of Schools A and B on the 
Sims Socio-Economic Scale: 


TaBLE I.—Mepian Scores or Pupits Stupiep in Scuoots A anp B: Srm’s 
Socio-Economic SCALE 








School Grades Median Sims’ description of the score 
— tested score for socio-economic status 

A V, VI 27 Very high 

B VI 11 Medium 
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The pupils tested in Schools A and B were more or less typical 
pupils for each school. In so far as their relative educational, social, 
emotional, and economic life may have been concerned, they appeared 
to be representative of the larger school group. However, thirteen 
of the pupils tested in the junior high school had been definitely 
regarded by the Visiting Teacher as deviates in personality adjustment. 
The Visiting Teacher had done case work with these thirteen, knew 
them fairly well, and had case histories of them in her files. Such is 
the general background of the pupils included in this study. 

In scoring the Personal Data Sheet, the score of each pupil is 
the number of unfavorable responses. Thus the higher the score the 
more emotionally unadjusted is the pupil. Dr. Ellen Mathews! 
has set up some tentative norms for age, grade, sex, nationality, and 
school retardation and acceleration. For an entire group, including 
both sexes, the median score of unfavorable responses is 23; for boys 
it is 20; for girls 25.5. Dr. Mathews has reported some interesting 
nationality differences. The median unfavorable response for Italian 
children is 36; for Jewish children it is 20; and for children of mixed 
Celtic and Teutonic stocks it is 16. 


TABLE II.—MeEp1an Scores or CLASSES COMPARED WITH TENTATIVE NORMS OF 
THE WoopworRTH-MATHEWS PERSONAL DaTA SHEET 














School Grade Enrol- Class Tentative norms Summit 
ment median (nationality ) deviations 
A 5 18 10 16 — 6 
A 6 23 8 16 — 8 
B 6 27 13 36 —23 
C 7-9 24 10 16 — 6 




















The data presented in Table II show that, on the whole, the emo- 
tional stability of the Summit pupils is better than the tentative 
norms; for it must be remembered that the lower the score the better 
the adjustment. This may be due in some part to the intensive 
program of personality adjustment carried out by the school staffs, 
cooperating with the Visiting Teacher. These groups are decidedly 
better than the norms; hence, one might judge that it would be more 
difficult to diagnose individually such cases. If that be the true 





1 Mathews, E.: A Study of Emotional Stability in Children. Journal of 
Delinquency, Vol. VIII, 1923, pp. 1-40. 
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CHuart A.—AGREEMENT BETWEEN INDEPENDENT DESCRIPTIONS OF PUPILS’ 
EMOTIONAL STABILITY Maps By MEMBERS OF THE ScHOOL STAFF FROM 
OBSERVATIONS AND BY THE INVESTIGATOR FROM PERSONAL DATA SHEETS 
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implication to draw from such data, it lends more validity to the 
diagnoses that follow. 

The chart on p. 42 reveals the agreement between independent 
descriptions of the emotional status of pupils as judged by the school 
staff, particularly the Visiting Teacher, and the independent descrip- 
tions as the investigator made them from responses on the Personal 
Data Sheet. 

It may be noted that in some cases individual items of the analyses 
of the two destriptions do not agree. The general rule followed here 
in determining whether the independent descriptions on each pupil 
agreed was to ascribe agreement if more than fifty per cent agreement 
was found among maladjusted characteristics, or items. If they 
disagreed in more than fifty per cent of the items, then disagreement 
was ascribed. 


TaBLeE IIJ.—AGREEMENT BETWEEN DESCRIPTIONS INDEPENDENTLY MADE FROM 
THE WoopworRTH-MATHEWS PERSONAL DaTA SHEET AND BY MEMBERS 
OF THE STAFF OF THE SuMMIT SCHOOLS 











Total pupils Cases on which independent descriptions Perntage of oer 
tested ment o indepen ent 
Agreed | Disagreed descriptions 
= 84 | 8 91.3 











From the evidence summarized in Table III, it is evident that the 
Woodworth-Mathews Personal Data Sheet is a fairly valid instrument 
for discovering personality maladjustments, such as those listed in 
Chart A of this study. However, care should be exercised by limiting 
the generalization about its validity as a diagnostic instrument to 
those items for which data were gathered. While one might infer 
that if parts of the instrument show high validity, other parts ought 
to share it; yet, such reasoning is based upon speculative and indirect 
evidence rather than relevant data. 

In order to investigate what degree of relationship might exist 
between socio-economic status and ratings of the pupils’ personality 
adjustment, the coefficient of correlation between Sims Socio-Eco- 
nomic Scale and the Woodworth-Mathews Personal Data Sheet 
scores was computed. The Pearson r was —.52; that is, the better 
the socio-economic status the better the emotional stability of the 
pupils. 
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Summary.—This study has provided a partial answer to the 
question: Is the Woodworth-Mathews Personal Data Sheet a valid 
instrument for diagnosing certain traits in personality in individuals 
when it is administered as a group test? It was shown that in items, 
such as social adjustment with others, home conditions, nervous 
habits, daydreaming, certain physical symptoms, fears of things, 
persecution complex, stealing, and symptoms of phobias or manias, 
there was an agreement of 91.3 per cent between independent analyses 
by the investigator and descriptions by the school staff. The analyses 
of the investigator were based upon responses to the Personal Data 
Sheet. The analyses of the school staff were based upon case studies 
by the Visiting Teacher and systematic observations by teachers, 
administrators and supervisors. 

Caution should be exercised in the interpretation of these findings. 
First, the proved validity is related to those items of the test that 
have been mentioned specifically; it was impossible at this time 
to get systematic observations for the other items. Second, this 
study comprised pupils from the fifth to the ninth grades, inclusive; 
and the findings may not be representative for pupils outside this 
range. 

The coefficient of correlation between scores on Sims Socio- 
Economic Scale and scores on the Woodworth-Mathews Personal 
Data Sheet was —.52 (PE of .07). Since the smaller scores indicating 
better emotional stability correlate with the larger scores for better 
socio-economic status, the correlation points to the fact that the 
better the economic status of the families the better are the chances 
that the pupils will be emotionally stable. 

















TESTING THE CUMULATIVE KEY FOR PROGNOSIS OF 
MUSICAL ACHIEVEMENT 


HAZEL M. STANTON 
The Psychological Corporation, New York City 


Individual differences in musical talent determined for freshmen entering a 
four year course in a university music school are predictive of successful graduation. 


Science constantly is confronted with the task of discovering 
and isolating certain factors or elements which can be known apart 
from accompanying elements and of designating them in appropriate 
symbolic forms. This is clearly illustrated in any survey of facts or 
collection of informative material. The work of collecting data, 
followed by description, recognition, designation, classification and 
interpretation, is an endless task but one which holds the investigator 
in its intricacies, and baffles yet fascinates, in spite of the limitations 
which are ever apparent. 

Musical talent in all its complexity has been scrutinized and 
analysed into certain definite sections such as musical sensitivity, 
musical action, musical intelligence, musical memory and imagination, 
musical feeling. Phases of these larger conceptions have been further 
analysed and some of them lend themselves to measurement, such as 
the simpler forms of musical sensitivity, which are the three attributes 
of sound, viz., pitch, intensity and time; the complex forms of musical 
sensitivity of which consonance and rhythm are now measureable; 
two factors from the sections of musical memory and imagination 
and musical intelligence, which are tonal memory and comprehension. 
The six tests of pitch, intensity, time, consonance, tonal memory and 
rhythm are the Seashore Measures of Musical Talent. These six 
tests and the Iowa Comprehension test, a non-musical test, may be 
given as group tests and used in a test program for high school graduates 
desiring to concentrate in music. These few measures, although they 
cannot tell the whole story or represent musical talent in its entirety, 
are measurements of the essence of musical talent. This interpretation 
is based upon my experience with the tests in examining several 
thousand applicants to a music school followed by knowledge of the 
musical achievement of these applicants over a period of twelve 
years. Low musical capacities as measured retard musical growth, 
high musical capacities as measured enhance musical growth. 

The Seashore Measure of Musical Talent are hearing measure- 
ments and each test includes one hundred responses scored in terms of 
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centile ranks from an adult norm. The comprehension test, a non- 
musical test, consists of forty-five responses scored in terms of number 
right, this score is expressed in centile rank from a norm for high school 
graduates. ‘The centile ranks are further interpreted into the larger 
scopes of a six letter scale of A, B, C+, C—, D,and E. The highest 
decile is classified A, the lowest decile E, the centiles from seventy to 
eighty-nine are given a B classification, from fifty to sixty-nine a 
C+, from thirty to forty-nine a C—, and from ten to twenty-nine 
a D classification. The six Seashore tests are interpreted in the form 
of a talent profile comprising the six centile ranks in pitch, intensity, 
time, consonance, tonal memory and rhythm and these talent profiles 
are classified empirically into an A profile, B profile, C+,C—, Dor E 
profile. High school graduates and entering students in a music 
school or college music department can be measured for musical talent 
and comprehension and these two factors can then be interpreted in 
terms of the letters A to E. 

A study of the significance of these varying degrees of talent and 
comprehension began in 1925 in an actual situation in a university 
music school. From data accumulated and from observation it wa> 
noted that certain students could be depended upon to achieve the 
four year program creditably and with satisfaction to themselves 
and the faculty of the school. This type of student could well be 
considered a safe academic risk, hence it seemed natural to designate 
these types as a Safe group. Another group had less of a margin of 
safety and yet under certain conditions would probably succeed in 
making satisfactory musical progress. These students naturally 
fell into a Probable group. A third group of students would find 
musical progress possible but the odds against them would be greater, 
hence the designation for this group of Possible. A fourth group 
consisted of those who were doubtful risks, those who for various 
reasons would not, with few exceptions, carry the work of the course 
with sufficient credit or satisfaction to warrant the effort involved. 
Doubtful is the natural term to designate this group. There is 
another group, those who are obviously not fitted to cary on regular 
course work in a music school. The odds against them in this 
particular field are too great to justify encouragement. This group, 
therefore, might well be called Discouraged. Hence, we have a five- 
fold classification of Safe, Probable, Possible, Doubtful, and Discouraged. 

To be of real service, however, such a classification must lend 
itself to prediction. In order to learn its predictive value a detailed 
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study of three entering classes was made at the end of the first semes- 
ter’s work of three hundred fifty-one students. The information 
used in the first attempt to predict that one student was Safe for the 
course and another was Doubtful for the course, et cetera, consisted 
of three distinct factors, (a) the two test scores for the musical capacity 
measures and the comprehension test, (b) teacher’s first semester 
ratings in talent and achievement for instrument and voice made 
without knowledge of the test results, and (c) the students’ first 
semester class marks. 

From the observation of these factors for each student at the end 
of the first semester each was assigned to one of the five groups, Safe, 
Probable, Possible, Doubtful, or Discouraged, this classification 
indicating the probability of individual continuance in the course. In 
making this observation the class marks with the emphasis on the 
subject of theory were considered first, teachers’ talent ratings, second, 
and test scores, third. These estimates were made in cooperation 
with Mr. Melville Smith who organized and taught the theory course 
for freshmen at that time. This cooperation was advantageous for 
two reasons, one was Mr. Smith’s definite knowledge of the theory 
work of all the students, and the other was his sincere interest in the 
students’ accomplishment and grasp of the subject as presented. 
The frequencies of each estimate of Safe, Probable, et cetera, for the 
students of each class approached a normal curve of distribution. 
The next step was the discovery of test combinations for each of 
the estimated groups. What were the group classifications of students 
with a test combination of A in talent and A incomprehension? It 
was found that most of the A, A students had been given a classifica- 
tion of Safe, with a few Probable, hence an A, A student was considered 
a Safe student. Likewise, the B, B students had a majority of 
Probable classification, hence a B, B student was considered Probable. 
The majority of C+, C+ students were given the Possible classifica- 
tion, thus a C+, C+ student was considered Possible. By noting 
the test combinations and the classification given the majority of 
students the Cumulative Key was formulated as shown in Table I. 
Each different test combination was placed in the five-fold group 
where it had occurred the greatest number of times except for a few 
with frequencies insufficient for location, in these cases arbitrary 
locations were given. The result of this arrangement presented itself 
in as consistent order as one might have made theoretically but 
might not have anticipated from actual findings. According to the 
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Cumulative Key, the A talent with its six degrees of comprehension 
occurs in the three groups of Safe, Probable and Possible, the B 
talent, in the four groups from Safe to Doubtful, the C+ talent in 
the three groups of Possible, Doubtful, Discouraged, the C— talent, 
in the two groups of Doubtful and Discouraged. As a result of earlier 
experimental studies the D and E talent were not admitted to the 
music school for the four-year course of study leading to the music 
degree, therefore, the D and E talent are not included in the Cumula- 
itve Key. 

Since the first organization of the Cumulative Key with three 
hundred fifty-one students, the students from four additional entering 
classes have been classified, on the basis of first semester achievement 
and the frequencies of their test combinations added to the original 
cumulative key bringing the total number included to nine hundred 
seventy-eight, or approximately one thousand, students in seven 
classes. This number should give a certain permanency to the key. 
There has been no change in the location of each test combination 
under its appropriate five-fold group with the addition of more 
frequencies. However, there have not been enough test combinations 
for students with low comprehension scores to know where they might 
eventually occur. For instance, there has been only one student with 
an A, E test combination, only seven with an A, D combination, and 
only nine with A, C—. Of the B, E combination there have been 
but ten cases, of the C+, E but eleven, and of the C—, E only three. 
The very few cases of test combinations in which comprehension is 
low carries certain significance. High capacities occur in greater 
numbers than low capacities and for the most part are accompanied 
by higher comprehension. In so far as comprehension is a factor of 
intelligence such information will tend to counteract the often heard 
statement that musicians lack intelligence. Although I do not have 
sufficient facts to refute such a statement, our facts and educational 
procedures up to the present time do not tend to confirm such a state- 
ment for the potential musicians of the future. 

The value of this cumulative key with its suggested five groups 
of decreasing potentialities from the Safe group to the Discouraged 
group has been shown from accumulated student records in a previous 
study, The Prognosis of Musical Achievement.' Since the organization 





1 Stanton, H. M.: ‘“‘The Prognosis of Musical Achievement, 1929.”’ Eastman 
School of Music, The University of Rochester, Rochester, N. Y. 
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of the cumulative key the students in four entering classes for whom 
prediction was made at entrance have completed the requisite work 
for graduation, or have left the course. Whether or not the prediction 
at entrance is fulfilled by the successful completion of the four-year 
course is a valuable check on the prognostic value of the key. Previous 
to the discussion of this new information, I shall review in general 
some of the facts presented in the Prognosis of Musical Achievement, 
page 30 following, which are the first facts indicating the value of the 
cumulative key for predictive purposes. These items were studied 
for the whole class of 1925, one hundred sixty-four entrants, and also 
for the two contrasting groups, the Safe group of eighty students, and 
the D, D group (Doubtful and Discouraged, combined) of ninety 
students from three entering classes. The items are as follows for the 
whole class of one hundred sixty-four students: 

1. Fourth Year Continuity —More than one-half of the Safe group 
continued, more than two-fifths of the Probable group continued, 
two-fifths of the Possible group continued, about one-fifth of the 
Doubtful group continued and none of the Discouraged group remained. 
The percentage of those continuing in their fourth year decreases from 
fifty-five per cent in the Safe group to zero percentage in the Dis- 
couraged group. 

2. Dismissals—From the Safe group to the Discouraged group 
the percentage of those dismissed increased from four per cent for 
the Safe to sixty-four per cent for the Discouraged. 

3. Scholarships and Honors.—There is a preponderance of scholar- 
ships and honors awarded to students in the three upper groups of 
Safe, Probable, and Possible with none awarded to students in the two 
lower groups of D, D. 

4. Kilbourn Hall Recital Appearances.—In performance, instru- 
mental and vocal, the percentage of students appearing in Kilbourn 
Hall for each of the five groups gradually decreases from fifty-seven 
per cent in the Safe group to eighteen per cent in the Discouraged 
group. 

For the extreme groups, the Safe group versus the D, D group 
for the three entering classes, the following facts were discovered: 

1. Annual Continuity.—There is a much greater percentage of the 
Safe group continuing annually than the percentage of the D, D groups. 

2. Dismissals —Of those who discontinued in the Safe group 
two or 11.7 per cent were dismissed, both of these dismissals occurring 
for disciplinary reasons. In the D, D group forty-two or 62.6 per cent 
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were dismissed, one case only for disciplinary reasons, and forty-one 
for academic reasons. 

3. Scholarships and Honors.—In the Safe group forty-three per 
cent were holders of scholarships and honors, in the D, D group only 
eight per cent were holders of scholarships and none received honors, 

4. Recital Appearances.—A much larger percentage of the Safe 
students appear in Kilbourn Hall recitals than those of the D, D group. 
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Fig. 1.—The percentage of Bite catia within four years in each of the five 
groups, Safe, Probable, Possible, Doubtful, Discouraged. 


N = five hundred sixty-five graduates from four successive classes in a university 
music school. 








5. Ratio of Hours to Points—A significa * contrast is shown 
between the two extreme groups in the annual: o of hours to points 
in practical music, theoretical music, and acaucimic subjects. The 
Safe group has the higher ratio. 

6. Representative Talent Profiles—A noticeable contrast occurs 
between the ten talent test profiles chosen arbitrarily to represent 
each group with those in the Safe group higher. 

7. Mean, Median, Mode.—The mean, median and mode for each 
of the seven tests are decidedly higher for the Safe group than for 
the D, D Group. 

In addition to the various points just reviewed, further evidences 
of the stability and consistency of the cumulative key are shown by 
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the new information of graduates from the four entering classes of 
1925, 1926, 1927, and 1928, a total of five hundred sixty-five students. 

These five hundred sixty-five students were classified at entrance to 
a four year course in practical music, theoretical music, and academic 
subjects as one hundred twenty-five Safe, one hundred forty- 
three Probable, one hundred ninety-five Possible, seventy-three Doubt- 
ful, and twenty-nine Discouraged. Within the four years allotted 
to the completion of the regular course, sixty per cent of the Safe 
students were graduated, forty-two per cent of the Probable students, 
thirty-three per cent of the Possible, twenty-three per cent of the 
Doubtful, and seventeen per cent of the Discouraged, Fig. 1. The 
descending series of the percentage graduating is strikingly consistent 
with the individual differences predicted from the cumulative key at 
entrance. In other words, such prediction is prognostic of musical 
achievement not only from the point of view of continuity, dismissals, 
scholarships, honors and performance, but finally according to the 
per cent of those in each group who graduate. 

There is a decided contrast between those who graduate from the 
extreme groups of Safe and D, D (Doubtful and Discouraged) in the 
four classes of five hundred sixty-five entering students, in that sixty 
per cent of the Safe group graduated and only twenty-one per cent 
of the D, D group graduated, three-fifths of the Safe group as com- 
pared with about one-fifth of the D, D group. 

In Fig. 2, the five-fold predictive classification for seven successive 
entering classes in a university music school are shown by the seven 
columns, one column for each class. All columns are equal in length 
but balanced, so to speak, on a horizontal line which places the Doubt- 
ful and Discouraged groups below the line and the Safe, Probable, and 
Possible groups above the line for each class respectively. As the 
eye follows the tops of these columns a curve is traced which falls 
from Class I to Class II then rises gradually through Class III to 
Class IV, falls again to Class V and rises more gradually through Class 
VI to Class VII. This curve suggests cycles of talent in three year 
groups which may or may not continue with some regularity. In 
Fig. 2, Class II is the lowest and Class IV is the highest which sym- 
bolizes the contrast in these two classes predicted by measurement at 
entrance to the course. In other words, Class II was recognized as 
the lowest entering class according to measurement and Class IV as 
the highest entering class according to the same measurement. The 
most highly concentrated indicator of the success of this prediction 
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is furnished by graduation data. The per cents of students graduating 
within four years from these two classes are thirty-four per cent from 
Class II and fifty-seven per cent from Class IV. 

During the progress of these two classes through the course there 
were many indicators to show one class as higher in talent and the 
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Fie. 2.—Cycles of musical talent shown by columns representing seven successive 
entering classes, 1925-1931, in a university music school. 
Numbers at the right of each column indicate the percentage of the respective 
classification in that entering class. 


other as lower in talent. Class II designated as the lowest in talent 
according to measurement proved to be the lowest in annual continuity 
and in the per cent of those successfully completing the prescribed 
course for graduation. Class IV designated as the highest in talent . 
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from the view point of measurement proved to be the highest in 
annual continuity and in the per cent of those who successfully com- 
pleted the course, even though this class carried through the beginning 
of the depression. 

The logical next step in the follow-up of the scientific significance 
of the prediction according to the cumulative key would be the voca- 
tional and avocational studies of the musical experiences of the Safe 
students as compared with the Doubtful and Discouraged students. 

From the additional history of the students’ completion of the 
course resulting in graduation we have a valuable check on the stabi- 
lity of the cumulative key as originally organized. The higher talent 
stays, the lower talent leaves; the higher talent meets the demands of 
the curriculum, the lower talent does not meet the demands of the 
curriculum. This key then serves not only as a guide for the designa- 
tion of musical talent but also as a guide for the prognosis of musical 
achievement. Its use gives an index of the annual status of the 
musical talent in a school, college or university. 


TaBLeE I.—Tue Cumutative Key. Test ComsinaTions oF Musicat TALENT 
AND COMPREHENSION GROUPED INTO A FIvE-FOLD CLASSIFICATION. THE 
First Letter Is THE CLASSIFICATION OF THE MusicaL TALENT PROFILE, 
THE SECOND LETTER Is THE CLASSIFICATION OF THE COMPREHENSION 
Test Score 








Discouraged Doubtful Possible Probable Safe 
C+, £E B, E A, E A, C- — a 
C-, C+ C+, C- B, C-— A, D A, B 
C-, C- C+, D B, D B, B A, C+ 
C-, D C-, A C+, A B, C+ B, A 
C-, E cC-, B C+, B 

C+, C+ 
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APPRECIATION OF LITERATURE AND ABSTRACT 
INTELLIGENCE 


HERBERT A. CARROLL 


University of Minnesota 


That linguistic ability and the kind of intelligence measured by 
the usual verbal intelligence test are correlated to a high degree has 
long been known; that esthetic ability and verbal intelligence are 
correlated only slightly, if at all, is also a widely held belief. It 
becomes, then, a very interesting question as to what happens when 
the linguistic and the esthetic are combined as they must necessarily 
be in appreciation of literature. Is this complex ability related to 
abstract intelligence, or does the emotional element involved offset 
the intellectual? In other words, do superior children enjoy the 
same quality of reading materials relished by the dull? Is literary 
judgment affected by intelligence? 


MEASURING INSTRUMENTS 


All the children included in this study had previously been tested 
with either the Pressey Senior Classification Test or the Terman 
Group Test, the former having been given to the junior high school 
pupils and the latter to the senior high schools. The derived intel- 
ligence quotients were taken, by the writer, from the school records. 
Undoubtedly, some of the individual ratings are in error, but since 
this investigation is concerned solely with group relationships, these 
would effect the findings but little. The instrument used in measuring 
the ability to appreciate literature was the Carroll Prose Appreciation 
Test,' a detailed description of which appears in the magazines 
referred to in the footnotes. This test was validated against three 
criteria: Source, expert opinion and comparative performances of 
groups on different educational levels. The test is composed of a 
number of sets of short selections, ten such tests appearing in the 





1 Carroll, Herbert A.: ‘‘Carroll Prose Appreciation Test.” Minneapolis and 
Philadelphia: Educational Test Bureau, 1932. 

Carroll, Herbert A.: A Standardized Test of Prose Appreciation for Senior 
High School Pupils. The Journal of Educational Psychology, Vol. XXIII, No. 6, - 
pp. 401-410. | 

Carroll, Herbert A.: A Standardized Test of Prose Appreciation for Junior 
High School Pupils. The Journal of Educational Psychology, Vol. XXIII, No. 8, 
pp. 604-606. 
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junior high school form, and twelve in the senior high school form. 
Each of the sets is made up of four selections of varying literary worth, 
the first choice being an excerpt from a book by an author of estab- 
lished fame, the second choice an excerpt from a book by a writer 
generally considered as second-rate, the third choice an excerpt from 
a story found in one of the less literary magazines, and the fourth 
choice a mutilation. 

The test was further validated against the judgments of sixty-five 
experts, a consensus of whose opinion is, for each selection used, in 
complete agreement with the source. An additional proof of the 
validity of the instrument lies in the fact that there is a reliable increase 
in average score for each grade step from Grade VII through Grade 
XII, that the mean score earned by college students is strikingly 
higher than that earned by high school students, and that teachers of 
English and professional literary critics are, in turn, much superior to 
college students in their ability to score well on the test. 

The reliability coefficients of the two forms of the instrument are 
as follows: (1) Junior high school, .70; (2) senior high school, .71. 

The test used, then, rests upon the assumption that the ability 
to appreciate literature can be measured by revealing the degree to 
which an individual discriminates among passages of varying worth. 
Its validity and reliability have been carefully established. 


GATHERING DATA 


Given the measuring instruments the procedure in gathering 
data was, of course, simple. Six hundred junior high school and two 
hundred seventy senior high school pupils, for whom intelligence 
ratings were available, were tested with the Carroll Prose Appreciation 
Test. These children came from five representative schools in 
Minneapolis and St. Paul. A summary of the findings appears in 
Table I on page 56. 

The product-moment correlations appearing in Table I indicate 
a marked degree of relationship between intelligence and the ability 
to appreciate literature. On the whole, they are only slightly lower 
than those ordinarily found between intellectual capacity and achieve- 
ment in a subject such as history or mathematics. 


COMPARISON OF EXTREMES 


Since the correlation coefficients appearing in Table I indicate 
only indirectly the extent to which children of superior and inferior 
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intelligence differ in their ability to appreciate literature, it is desirable 
to analyze the appreciation scores of children at either extreme in 
intelligence. Of the high school pupils examined, all those whose 
1Q’s were 120 or over were put into one group and those whose IQ’s 


TABLE I.—RELATIONSHIP BETWEEN APPRECIATION OF LITERATURE AND ABSTRACT 
INTELLIGENCE ON HicH Scuoo.t LEVEL 








Total, Total, 

Grade VII | VIII on LHS. x XI | XII ls. HS. 
IS ook a cinco 200 200 200 600 86 89 95 270 
r (with PE)........ oo 27 31 41 46 .43 .48 .43 


+ .042) + .044/ + .043) + .022 | + .057| + .059) + .053) + .033 
r (corrected for 
attenuation)..... .42 .34 .39 .52 .58 .54 .60 .54 





























were below 100 into another. Their scores on the Carroll Prose 
Appreciation Test were then tabulated and compared. (See Table II.) 


TaBLeE II].—CompaRISON OF INTELLECTUALLY SUPERIOR AND INTELLECTUALLY 
INFERIOR CHILDREN IN APPRECIATION OF LITERATURE 











Junior high school Senior high school 
One hundred One hundred 
Below one twenty 10 Below one twenty 10 
hundred IQ y hundred IQ y 
and over and over 
SS Ee ene 59 52 58 57 
Mean (IQ) (with PE)... 94.74 130.29 96.07 128.60 
+.485 + +.896 + .392 + .625 
MEAS. FRG. . 5.52 9.60 4.44 7.04 
Mean (literature appre- 
is a a 18.93 28.85 30. 57 44.71 
+ .638, + .870 + .841 + .627 
SS. a ave abe 7.29 9.47 9.51 7.05 

















The difference in appreciation of literature scores between children 
who are very superior in intellectual ability and those who are below 
average is, as shown in Table II, very great. The gap between the 
means for the two junior high school groups is 9.92 points with a 
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probable error of only 1.08, and between the two senior high school 
groups 14.14 with a probable error of 1.03. These differences obvi- 
ously possess high statistical reliability. 

In terms of overlapping, only a small percentage of the intellectu- 
ally superior pupils fall below the mean appreciation score of the 
dull, while, conversely, a relatively small number of the intellectually 
dull equal or exceed the median appreciation score earned by the 
bright. Exact data appear in the two following statements: (1) 
On the junior high school level 11.7 per cent of those with intelligence 
quotients below one hundred equal or exceed the median of those 
with intelligence quotients above one hundred twenty, while 10.93 
per cent of the intellectually superior group fall below the median of 
the intellectually inferior group; (2) on the senior high school level, 
only 6.55 per cent of those with intelligence quotients below one 
hundred equal or exceed the median of those within telligence quotients 
above one hundred twenty, while 2.89 per cent of the intellectually 
superior group fall below the median of the intellectually inferior 


group. 
CONCLUSIONS 


Kight hundred seventy junior and senior high school students were 
tested with either the Pressey Classification or the Terman Group 
Test and with the Carroll Prose Appreciation Test to discover whether 
or not intellectual capacity and appreciation of literature tend to be 
found together. The data definitely show that they do, the brighter 
children possessing much better literary judgment than the dull. 
This has been shown in three ways: (1) By marked correlations 
between scores earned on tests of the two variables; (2) by statistically 
significant differences between the average performances on the 
appreciation tests of intellectually superior and intellectually inferior 
groups; (3) by the very small percentage of dull children who equal 
or exceed the median appreciation score of the bright or of bright 
children who fall below the median for the dull. 

The ability to appreciate literature rests to a considerable degree 
upon comprehension, and comprehension, in turn, of course rests 
upon intellectual capacity. Though this statement goes beyond the 
data herein presented, I venture to say that it is very unlikely that 
any individual of low intelligence ever truly appreciates the best in 
literature, or that he can be taught to do so. 
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FACTOR-ANALYSIS TECHNIQUES APPLIED TO 
PUBLIC-SCHOOL PROBLEMS 


W. LINE, K. H. ROGERS, AND E. KAPLAN 


University of Toronto 


I. INTRODUCTION 


The adjustment to school curricula is, without question, an 
important aspect of child development. It is immediately important 
to the child, particularly at the public-school level, since all his social 
contacts are influenced thereby. It is of great significance to parents, 
who are, in most cases, watching for the first time the progress made by 
their children side by side with others, in attacking increasingly difficult 
tasks. It is the central feature of the educator’s purpose, in that it 
reflects the degree of intellectual achievement fostered by the school. 
When progress is apparently normal—normality being formerly 
viewed in terms of a chronological-age reference, but in recent years 
more happily involving a mental-age comparison—the attitude of 
the teacher or parent is, on the whole, one of satisfaction. But repeat- 
edly there appears inhibited progress out of harmony with intelligence 
measures, regularity of attendance, physical health, pedagogical and 
educational effort and efficiency; and it is then that perplexity finds 
relief in comparatively useless alibis, such as ‘‘special abilities and 
disabilities,” “‘likes and dislikes,” ‘‘laziness,”’ ‘‘lack of application,” 
and so on. 

Clinical examination of many individual cases in this connection 
not only substantiates the view that inadequate achievement indicates 
personality imbalance, but also points to the fact that some order 
may be observed in the type of disparity-pattern occurring, relative 
to the mental habits of the child. Instead of all-round backwardness, 
as in cases of inadequate intellectual calibre, a patterning of failures 
and successes seems to occur, as if the demands made by the different 
school-subjects were psychologically distinct in some respects. Fail- 
ure in Reading occurs frequently with failure in Spelling, for example, 
but appears to be less intimately related to Arithmetic—although 
this would not be equally true in all educational settings. Similarly 
Literature and Geography usually have a closer relationship than is 
the case with History and Geography. The question arises, therefore, 
as to the possibility of clarifying the connection between disparity- 
patterns and habits or tendencies of thought. The latter may be 
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referred back to home or extra-school environment, or to other social 
influences—a direction of search that has always characterized clinical 
case methods. But such references are never final. They must be 
interpreted eventually in terms of the child’s mental tendencies and 
attitudes themselves; and this procedure depends upon a system of 
psychological concepts framed by the clinician to direct his interpreta- 
tions. Could, therefore, supplementary studies approach these 
problems a little more directly, by examining the thought life of the 
child in relation to curricular adjustment on the one hand, and social 
background on the other? Success in this approach would undoubt- 
edly assist in guiding curricular revision, in relating educational 
procedure more harmoniously to a comprehensive mental hygiene 
outlook, and in interpreting more definitely the relationship between 
the various social institutions in fostering the healthy development 
of children. 

The studies here reported are made from this point of view. They 
form a part of an analysis of educational situations, in which the 
initial task is to arrive at a method adequate to our purpose. It will 
be seen from what follows that the method pertains to the field of 
differential psychology, and involves, at this stage, an exploratory 
evaluation of factor-analysis techniques in this connection. Par- 
ticularly are we concerned with the harmony between the comple- 
mentary approaches of Spearman and Thurstone, their relationships 
to the problems above outlined, and some of their limitations. 


II, PROCEDURE 


The academic records of eighty fifty-grade pupils (average age 
ten years) were examined in relation to certain psychological measures. 
Examination marks representing the final standing for the year in 
each subject were given by the teacher, and these were taken as a 
possible basis for determining what standards or demands had to be 
met by pupils in this particular setting. The psychological measures 
employed were as follows: 

(I) “Intelligence Test’’ scores. (Pintner.) 

(II) ‘‘Perseveration”’ or p scores. These were obtained by using 
a battery of tests similar to that employed by Pinard.® 

(III) ‘Alertness’ tests, consisting of very easy intellectual tasks 
under conditions of maximum speed.’ 

(IV) ‘Speed of Execution”—the number of letters, digits, etc., in 
familiar sequences (e.g. 1, 2, 3, 4, etc.), written in short-time periods. 
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(V) “Fluency.” These tests consisted in the number of different 
ideas that could be written in five minutes, according to the techniques 
employed by Hargreaves.? 

In the case of each of the batteries II to IV, definite indication 
had been obtained of a group factor, at least within the range of the 
material employed; so that, in addition to g (presumably present in 
the ‘‘Intelligence” measures) three largely independent factors were 
involved. Measure V, “Fluency” seemed to be related to both 
“‘Perseveration”’ and ‘‘Speed of Execution.’”’ (See Table of Correla- 
tions, below.) 


III. RESULTS 


Intercorrelations were calculated throughout the whole table of 
variables, with the following results: 








TABLE I 
A B ici D E F G H riéi & L M N 

A. Total 

standing..|..... .87|.83| .69) .72) .62) .67| .53) .61/.40) .13) .09|—.23) .20 
B. Composi- 

a eer 64, .65) .56) .43) .52) .46) .50).39) .23) .11)—.14} .20 
C. History...| .83) .64)...| .39) .58) .57) .52) .43) .58).25) .12} .11) .09) .10 
D. Spelling...| .69} .65).39)..... .44) .22) .29) .62} .19/.37) .28) .12;}—.16) .31 
EZ. Literature.| .72) .56).58) .44)..... .55] .84) .36) .53).15)—.05) .00)—.04)—.04 
F. Pintner in- 

telligence.| .62| .43).57) .22] .55)..... .42| .35) .44/.13) .09) .02/—.10)—.01 
G. Arithmetic} .67| .52).52) .29) .34) .42)..... .20| .39|.22;—.11] .07|;—.37) .08 
H. Reading..| .53) .46).43) .62) .36) .35) .20)..... .07}.19} .13) .23)—.10) .32 
I. Geography .61; .50}.58) .19) .53) .44) .39) .O7]..... .12} .O1)—.09) .04)—.16 
J. Writing...| .40) .39).25) .37) .15) .13) .22) .19) .12]...] .37| .31) .05) .33 
K. Fluency. . .13) .23).12} .28)—.05) .O9}—.11) .13) .01).37)..... -11) .41] .23 
L. Alertness..; .09} .11).11) .12) .00) .02) .07} .23;—.09).31) .11)..... A 
M. Speed of 

execution. | — . 23) — . 14] .09| — . 16) — .04)—.10|— .37/—.10} .04).05) .41) .13)..... — .25 
N. Persevera- 

CR. cceee .20} .20).10} .31)—.04);—.01) .08) .32)—.16).33) .23) .17| .25 















































In this table, ‘“‘Perseveration” scores (N) were reversed to give more positive correlations. 
Variable N is therefore more truly ‘‘ Non-perseveration.”’ 


From these values, tetrad-differences were calculated. The 
distribution of tetrad-differences was as shown in table at top of p. 61. 
Before discussing these results, we may proceed to the next point 
in the evaluation of the data. Thurstone’s Multiple Factor tech- 
nique was applied to Table I,‘ and four factors were indicated as 
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it 
; Tetrad-differences Frequency Tetrad-differences Frequency ) 
i 
iy 
Plus Minus 4 
n .40-.44 1 .00- 1313 | 
e .35-.39 3 .05- 774 | 
n .30- 10 .10- 492 i 
’ .25- 38 15- 247 " 
h .20- 125 .20- 125 Mf 
.15- 247 .25- 38 i 
- .10- 492 .30- 10 i 
.05- 774 .35- 3 | Ps 
.00- 1313 .40- 1 i 
Bt 
f I I ak edna an hued vits oon benee kena .062 i 
B64 Sa st iy ce ka star ee Rn ES eas «iiss «0 by ee .038 ti 
underlying the variables. The loadings of these factors were as : 
” follows: 
"a TaBLe II 
- Variable Factor I Factor II Factor III Factor IV 
20 A 94 —.14 ~ 38 —.03 
” B .84 .02 — .07 —.10 
- Cc .78 —.17 .09 .07 
D 71 .28 — .05 — .01 4 
D1 E . 66 — .54 —.14 .30 a 
08 F .62 — .53 .03 15 | 
a G 59 — .05 — .87 07 
33 H . 63 .20 .02 .07 
23 I .54 — .61 .10 — .47 
17 J .54 .36 14 — .04 
- K .33 31 52 .00 
L .29 .35 .15 .10 
M — .09 — .02 .87 .38 
. N 35 51 —.07 —.15 | 
18. 1% 
| 
Judging by the loadings of the first three of these factors in i 
e the various tests, they appear to be related respectively to g, c, (the He 
l. | inverse of p) and what we have called “speed of execution.’”? The 
it identification of the fourth factor is not soeasy. In a previous study,? | 
1- however, “‘alertness” scores were reported as being somewhat related 


is to “perseveration” scores in that both high and low “‘ perseverators”’ 


— 
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tended definitely towards high “‘alertness.”’ The interpretation there 
suggested was that “‘alertness’’ may in some cases be cultivated as a 
compensation for a strong ‘“‘perseverative” tendency. However 
this may be, we may provisionally—and merely for convenience— 
adopt the name ‘“‘alertness” for the fourth factor. 

Comparing the results of the tetrad technique with those obtained 
by Thurstone’s multiple factor analysis, we note that the variables 


“perseveration,” ‘speed of execution,” “alertness,” were added 


because they had already given significant indications of factors 
differing from g. It was for this reason that they were included in the 
correlation table. (See above.) Had they not been present, the 
analysis of the table including Intelligence and School variables 
only, would have indicated but one factor, by Thurstone’s criteria, or 
two at the most. Their presence—originating in Spearman’s synthetic 
method of building a correlation table—made possible a more exhaus- 
tive factorial analysis, although the significance of the subsidiary 
factors cannot yet be stated.* In so far as the techniques are com- 
parable in terms of these results, there is perfect harmony; for, 
while four factors were put into the table (by Spearman’s method), 
four were isolated by the Thurstone analysis. So, too, the tetrad- 
differences that approach significance may be predicted from the 
factorial loadings recorded in Table II. (See, for example, the 
tetrads involving Reading-Spelling, Literature-Geography correlations, 
etc.) 

Of significance to Education would appear to be that the subject- 
standards reflect g to a fairly large, though varying degree. In 
passing, we would draw attention to the fact that, despite considerable 
g-weighting in many of the subjects, achievement and intellectual 
calibre do not seem to be correlated as highly as educational efficiency 
demands. When the psychological nature of the other significant 
factors contributing to the subject-scores has been more carefully 
revealed, it should be possible to understand more fully the reasons 
for their presence, reasons that would illuminate not only the thought- 





* (a) It appears that the conditions under which a Table of Correlations is 
exhaustively analysed by Thurstone’s technique are: (I) the absence of any remain- 
ing “pivot” tests; (II) the indication of completeness where the sum of the load- 
ings-squared equals (or approximates) unity in the case of each variable; (III) the 
insignificant size of each loading of the last-derived factor. (b) In connection 
with the last point, there appears to be no way yet devised for determining the 
PE values of the loadings, so that their true significance is hard to estimate. 
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tendencies of the child, but also the nature of the mental processes 
demanded by these various school procedures.* 

Of significance to test-construction is the fact (shown in Table IT) 
that the “intelligence” test scores seem to be loaded to a considerable 
degree with “perseveration.”” From this, we might expect that the 
variation in IQs commonly found when different group tests are used, 
might be clarified if other so-called “‘intelligence” measures had been 
included. Factor analysis might then indicate some source of dis- 
crepancy between ratings on the same children obtained by different 
tests, quite apart from unreliability as ordinarily determined. In 
other words, the validity of such tests might be examined in further 
detail by the techniques here under discussion. 

This suggestion is very far reaching in its scope. In using an 
outside criterion of validation—total school standing, for example— 
two tests of “‘intelligence’”’ may appear to be equally valid. They 
may correlate equally highly and consistently with such a criterion, 
since the latter may be of so composite a nature as to cloak many of the 
psychological variables that are significant to its parts. The tests 
may also be equally reliable. Further, they may satisfy the conditions 
of two-factor division in certain statistical settings. Yet they may 
at the same time manifest characteristics that involve factors other 
than g to quite varied degrees. ft 

Because of this possibility, the comparison between the results of 
factor analysis with two tests of “‘intelligence’’ was undertaken. The 
additional data included scores on the National Intelligence Test, 
Form A, which were then correlated with the fourteen other variables; 
and the coefficients were added to the table of correlations. These 
coefficients, given in the order A, B, . . . N (Table I) were as follows: 
48, .39, .45, .20, .33, .78, .34, .54, .28, .17, .13, .382, —.03, —.10. The 
enlarged table of correlations was analysed by Thurstone’s technique, 
giving loadings of four factors as before. 

The new analysis gave loadings pertaining to the original fourteen 
variables that were essentially the same as those recorded in Table II. 
Only minor evidences of disturbance appeared, due to the inclusion 





* The results here reported have been extended in this direction, and are 
embodied in an article by the second author, entitled ‘‘Intelligence and Persevera- 
tion Related to School Achievement.”’ (Shortly to be published.) 

t The two-factor technique would, of course, detect this fact if used systemati- 
cally and carefully over a wider range of comparable variables. 
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of the new variable. But an interesting contrast was that between 
the two “intelligence” tests. The results were: 








Factor I |Factor II |Factor III/Factor IV 
Pintner test, (second analysis)........ .67 — .61 .07 .00 
Pintner test, (first analysis).......... .62 — 53 .03 .15 
nak ened 4s cinue cenaine os .63 — 44 Bs i .43 

















It must be remarked that these two tests differ in the conditions under 
which they are administered in one main feature. The time allot- 
ments for the various groups of items are very generous in the Pintner 
examination. In every case, all the children had finished the page 
before time was up; and almost all of them appeared to have completed 
any checking of their answers that they cared to do. The National 
Test, on the other hand, imposes a time limit that is never adequate 
to allow completion of any particular set of examples, even with the 
brighter pupils at this level. This may be somewhat reflected in the 
weightings of factor two ‘‘ perseveration” since the difference, though 
small, is in the direction to be expected under these conditions. It 
certainly appears to be manifested in the fourth factor “alertness” 
loadings, where the difference is quite marked. Variation in IQ 
ratings obtained by different group tests, each of which is highly 
reliable, may be partially explained on this basis; and certainly the 
old controversy as to whether such tests measure ‘‘speed”’ or “‘ power” 
can be illuminated by more careful and objective definition of those 
two qualities in factorial terms. * 


IV. SUMMARY 


At the outset the question was raised as to the significance of 
adjustment to school curricula. Analysis of this adjustment might 
throw light upon the thought tendencies of children, and on environ- 
mental and classroom situations in terms of the mental demands they 





* In another connection, and with a different sample of pupils at this level, the 
National Test gave, on analysis, the loadings .57, —.40, and .22 on the first three 
factors respectively. (Factor IV was not considered.) These would seem to 
corroborate the above. 

Further corroboration is suggested by the following: The eighty pupils were 
divided into three groups, (a) those whose National Score was .5 sigma greater 
than their Pintner score; (b) those whose Pintner Score was .5 sigma greater than 
their National score; (c) those whose scores on the two tests were equivalent 
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.set up. As techniques available for such studies, those of Factor 


Analysis were considered; and, in a practical school setting, the 
approaches offered by Spearman and Thurstone were adopted as 
illustrations. The essential harmony manifested in the results by 
these two different approaches suggests the worthwhileness of the 
data thus obtained. These data have been briefly examined in 
relation to their educational significance, and their implications for 
mental-test construction and interpretation. 
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(in deviation units). This classification was compared with those on vhe basis of 
“perseveration” and “alertness” scores as follows: 
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“‘Perseveration”’ | *‘ Alertness’”’ 
High p Low p High Low 
(first [NOTM*IP) (fourth | (first (fourth 
(middle Nor- 
twenty- twenty- | twenty- twenty- 
fifty mal 
five seus five five five 
per cent) P per cent)|per cent) per cent) 
0 ES Tee 22.2 55.5 22.2 55.5 | 44.0 0 
(National score greater) 
IN ito dd drau Kcaimces 55.5 22.2 22.2 11.1 11.1 77.7 
(Pintner score greater) 
| RR Tree 7 62 31 24 35 41 
(Pinter and National 
scores equivalent) 























The first row reads, 22.2 per cent of Group (a) were in the first 25 per cent of the 
“perseveration”’ distribution, 55.5 per cent were in the middie half, 22.2 per cent 
in the fourth 25 per cent, and soon. The indications are tha’ »intner scores tend 
to be higher than National scores for individuals showing high “perseveration” 
and low ‘‘alertness.”” (See Group (b).) The reverse appears to characterize high 
“alertness.” (See Group (a).) This again harmonizes with the factor loadings 
given above. 
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CHANGE IN SCORES ON THE PSYCHOLOGICAL 
EXAMINATION OF THE AMERICAN COUNCIL ON 
EDUCATION FROM ee TO SENIOR 

AR 


T. R. McCONNELL 
Cornell College, Mt. Vernon, Iowa 


Much interest has been shown recently in the growth of intelligence 
at the college level. Several studies have provided data on the 
increase in intelligence test scores during this period of education, 
among them the reports by Wright and Rogers.' 

The writer finds no record, however, of any study on gains in 
scores on the Psychological Examination of the American Council on 
Education over the interval of four years residence in college. 

Seventy members of the senior class of 1932 at Cornell College had 
taken the 1927 edition of the Psychological Examination with the 
freshman class in September, 1928. These seventy seniors were 
retested on the 1928 edition of the Psychological Examination in 
April, 1932. Prof. Thurstone has reported in the Educational Record? 
the equivalent scores on the yearly forms. Thus, since the seniors 
were tested on the 1928 edition, the original (1927 edition) scores were 
transmuted into 1928 equivalents for comparison. 

The purposes of the study are as follows: 

1. To show the effect of four years’ growth and training in college on perform- 


ance on this examination. . 

2. To discover the relative variabilities of the group as freshmen and as 
seniors. 

3. To show the amount of displacement in rank by tenths of the distribution 
from the freshman to the senior year. 

4. To show the distribution of gains in scores from the earlier to the later 
period. 

5. To discover the relative gain of the lower and higher fifty per cent in the 
first test. 

6. To compare the mean gains of men and women. 


Table I shows the data for comparing the average scores of the 
group as freshmen and as seniors. The difference in means is 40.42. 


1 Wright, M. B.: The Development of Mental Ability at the College-Adult 
Level. Journal of Educational Psychology, Vol. XXII, 1931, pp. 610-628. 
Rogers, A. L.: The Growth of Intelligence at the College Level. School and 
Society, Vol. XX XI, 1930, pp. 693-699. 
2 Thurstone, L. L. and T. G.: Psychological Examination for 1928. Educational 
Record, Vol. X, 1929, pp. 105-115. 
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The probable error of this difference, computed by the well-known 
formula for the probable error of the difference between means when 
correlated is 2.23. The critical ratio is 18-plus, revealing a practical 
certainty of a true difference greater than zero between the scores as 
freshmen and the scores as seniors. 

Table I also shows that the difference of 2.2 in standard deviations 
between the freshman and senior test is not statistically significant, 
since it is but .58 of its probable error. There is no reason to believe, 
then, that the variability of the group as seniors, as measured by the 


standard deviation, was significantly greater or significantly less than 
four years ago. 


TaBLE I.—INCREASE IN SCORES ON THE PSYCHOLOGICAL EXAMINATION BETWEEN 
THE FRESHMAN AND SENIOR YEARS 
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Since the College assigns students a rank on the psychological test 
by tenths, for convenience in thinking of their placement, Table II 


= 18.12 








TaBLE II].—DISPLACEMENT BY DeEcILE INTERVALS BETWEEN FRESHMAN AND 


Senion YEARS 
NuMBER DE£cILE 
INTERVALS FREQUENCIES 


+1 
+2 
+3 
+4 
—1 
—2 
—3 
—4 
Average displacement of those who raised their rank (NV = 19) 1.78 
Average displacement of those who lowered their rank (V = 19) 2.21 
Average displacement of all who raised or lowered their rank 
SF OPM cas ck ous evbacebcawes scocecussegwenee@enset tat 2. 
Average displacement of cases (NV = 70).............000000- 1.11 
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is provided to show the amount of displacement by decile intervals 
between the two testing times. Nineteen students raised their rank 
by an average of 1.78 decile intervals, and nineteen were lowered by 
an average of 2.21 decile intervals. The average displacement of the 
thirty-eight whose rank changed was two decileintervals. Considering 
the entire group of seventy, the average displacement in the senior 
from the freshman year was 1.11 decile intervals. 

Changes in scores for individual students ranged from a gain of 
one hundred one points to a loss of twenty-four. Four seniors’ scores 
were lower than their original standing by two, five, eleven, and 
twenty-four respectively. At the other extreme were three gains of 
ninety-nine, one hundred, and one hundred one. The standard 
deviation of changes in scores was 27.2. It would be extremely 
interesting to know what factors were responsible for such large gains 
on the one hand, and for losses or relatively small gains on the other. 

Table III shows that the mean gain of the women was 11.85 points 
more than that made by the men. The probable error of this differ- 
ence is 6.84, and the critical ratio 1.73, which indicates that the 
chances are eighty-eight in one hundred that the true difference is 
greater than zero. 


TaB_Le IIJ1.—ComparRaTIvE GAINS OF MEN AND WOMEN BETWEEN THE FRESHMAN 
AND SENIOR YEAR 


RP EA OTR nee be AER Oe ere a 47 
Mek ee ee Oe ein ak og thee ke dn ee seeene’ 5.72 
rs S000 0S Eos ae haa bop odode anwees 35.15 
A EGLSUN Gs be o4 as FS OEE Uh ea doc sc cVhiddadenneees 3.77 
ESSER AAI TAR CY EET TE 11.85 
DMGLtP. tL éys ebeh dah wh ekeed eked eke 4 eee Kc etna ees 6.84 
REESE SER ECS i Rey a AE Gs OT Fe 1.73 


The correlation between scores as freshmen and scores as seniors 
was .83 + .024. Those who were in the upper fifty per cent of the 
original distribution made an average gain of thirty-seven, while 
those in the lower half made an average increase of 43.28. 

It is of course impossible to say what the relative influence of 
various factors may have been in this increase of scores. All or a part 
of the following may have been responsible: 

1. Growth in underlying capacity. The results of recent studies on 
the curve of growth of what has been called intelligence make some 
increase due to this factor still plausible at the college level. 

2. Growth in effective use of endowment. For instance, increase 
in score on Part 3 (Analogies) might reflect refinement of habits of 
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observation, analysis, and systematic effort to discover relationships. 
Score on Part 2 (Artificial Language) might conceivably be bettered 
by acquisition of more effective modes of attack on language learning 
situations. Although equivalent scores for successive years are 
available for the test as a whole, such data have not been published 
for the several parts. Thus it is impossible in this study to indicate 
those sections on which improvement was greatest. 

3. Specific training. An analysis of the test suggests that instruc- 
tion may directly affect the examinee’s performance. Some sections 
are probably more susceptible to this effect than others. It is also 
conceivable that some curricular patterns will have more influence on 
performance than others. | 

4. Varying sets, emotional states, motivation, and other like 
factors, as well as conditions under which the test is administered, may 
account for some of the change in scores. 

It is probably true that the emotional state of many freshmen 
confronted with an intelligence examination is not conducive to best 
performance. That better poise on the part of the senior examinees 
could have been responsible for a large share of the average increase, 
however, is doubtful. As a matter of fact, it is probable that these 
students as freshmen were more concerned with making good scores 
than they were as seniors. The seniors realized that their score on 
the test was not a crucial matter. It is possible, therefore, that the 
depressing factor of emotional tension in the original test situation was 
offset by the stimulus of strong motivation; and that the emotional 
ease of the later situation was offset by reduced incentive. 


SUMMARY 


1. The average gain of seventy college students in scores on the 
Psychological Examination between the freshman and senior years was 
40.42, a difference which was statistically significant. 

2. There is no evidence that there was a significant difference 
between the variabilities of these students at the two testing times. 

3. There was considerable displacement in rank by decile intervals 
from thé freshman to the senior year. 

4. Those who were in the lower half of the distribution on the 
original test gained more than those in the upper half. 

5. The correlation between the results of the two tests was .83. 

6. The difference between the gains of the men and the women was 
not statistically significant, although what difference there was 
favored the latter. 
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SEX DIFFERENCES IN ACHIEVEMENT IN PHYSICAL 
SCIENCE 


A. W. HURD 
Institute of School Experimentation, Teachers College, Columbia University 


Sex differences are matters of continuing interest. The present 
article gives data from the field of physical science. The instructional 
anit in high school physics, “ Electric Lighting Systems” was used with 
thirteen hundred twenty-six pupils enrolled in fifty-three classes in 
thirty-four schools. 

For these comparisons, one hundred thirty-four boys were matched 
with an equal number of girls by age, grade,and instructor. Matching 
by age-grade status is a rough means of equating groups in intelligence. 
In addition, the members of each pair were under the same instruction 
for eighteen class periods of forty-five minutes each, or their equivalent. 
Differences between test scores in preliminary and final tests, and score 
gains should depend on inherent sex differences or different earlier 
training due to sex. The test which included one hundred eleven items 
had a reliability coefficient of .957 + .006. The items paralleled the 
subject-matter of a unit textbook and specially prepared work sheets, 
and instruction was specifically directed toward high scores in the 
minimum essentials represented. ‘Table I presents the data. 


TaBLE I.—ComPARISONS OF ACHIEVEMENT Scores oF Boys AND GIRLS IN HIGH 
ScHoo.t Puysics 




















Means for Means for Differences | Critical 

boys girls in means ratios 

Preliminary test.......... 24.58 + 1.01*/14.30 + .78)10.28 + 1.05f| 9.79 
a” 75.28 + 1.45 |70.25 + 1.68) 5.03 + 1.91 2.63 
| ER ere 50.70 + 1.46 [55.96 + 1.69) 5.26 +1.94/ 2.71 
Percentage of possible gain |58.66 + 1.56 |57.87 + 1.821 .79 + 2.03 .39 





* Standard errors. 
¢ Standard error, correlated measures. 


The boys are clearly superior in the preliminary test. They show 
a superiority also in the final test. The mean gain for the girls is 
greater but the mean per cent of possible gain favors the boys slightly. 
The girls start on a lower level but make up part of the initial deficiency 
during the period of instruction. The data suggest that the earlier 
training of the girls had been deficient in this field of work; that they 
made up part of the initial deficiency; but that the boys were still 
superior on the final test, and at least even on the percentage of gain 
it was possible for them to make. 
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BOOK REVIEWS 


C. A. Mace. The Psychology of Study. New York: Robert M. 
McBride & Co., 1933, pp. VIII + 96. 


This theoretical essay purports to be concerned with “the utility 
of the psychologist as a mentor at the elbow of the student, and with 
the application of his knowledge to that somewhat specialized, but 
nevertheless important, function of the mind—the function of ‘study.’”’ 
The reader familiar with current research in educational psychology 
is left somewhat disappointed, however, at the meagreness of the 
practical suggestions given, and with the slight attention given to 
experimental findings on such pertinent topics as recitation in learning, 
whole-part learning, and distribution of practice, that should be the 
richest sources for a volume on this topic. 

The terminology tends to follow McDougall’s purposive psychology 
with the substitution of “‘ propensities” for “instincts.’”’ Behaviorism 
is regarded by the author as ‘“‘a convenient abstraction.” 

After a rather refreshing and promising introduction, the author 
swings into a logical and well-outlined treatise on Human Nature. 
Emphasizing ‘“‘the purposes of man” which have their roots in his 
“original propensities,’ the writer then proceeds to analyze man’s 
“‘complex intellectual apparatus” into its ‘‘three principal constituents 
—the apparatus of perception, the apparatus of memory, and the 
apparatus of constructive thought.” Each of these in turn has a 
“triplicity of purpose.” A separate chapter is then devoted to each 
of the topics perception, memory and originality. A final chapter is 
concerned with ‘‘Concentration and the Will to Work” for ‘‘The 
ultimate source of efficiency in observation, in memory, and in con- 
structive thought is insatiable curiosity and the will to know.” 


DorotHEeA McCarty. 
Fordham University. 


Rosert S. Woopworrs. Adjustments and Mastery. Baltimore: 
The Williams and Wilkins Company, 1933, pp. V + 137. 


“‘ Adjustment and Mastery,” a volume in the Century of Progress 
Series, deals with the problems met by the individual in his effort 
to fit in with life situations as he finds them. The difficulties involved 
are not easily overcome even by the normal, while for the abnormal 
they may be insurmountable. Since the book was written primarily 
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for the lay reader, it includes much background information on such 
topics as individual differences and fundamental desires. The 
writer then shows the relationship between these facts and both 
minor and major maladjustments. He necessarily concerns himself 
largely with exposition, since so few sound recommendations can 
be made concerning the treatment of mental and emotional disorders. 
Professor Woodworth has the rare gift of combining simplicity 
‘with depth, and of causing a mass of details to take on interest and 
unity. He is undoubtedly one of the clearest thinkers in psychology 
today. ‘‘Adjustment and Mastery” is heartily recommended to 
anyone who wants a simple and at the same time scholarly analysis 
of problems of adjustment. Hersert A. CARROLL. 
University of Minnesota. 


LEONE CHESIRE, Mitton Sarrir, and L. L. Tourstone. Computing 
Diagrams for the Tetrachoric Correlation Coefficient. Chicago: 
The University of Chicago Bookstore, 1933. 


This monograph consists of forty-six abac’s together with directions 
for their use in computing tetrachoric correlations. Each abac 
treats a single proportional frequency. The charts are suitable 
when the assumption of a normal distribution underlying the variable 
is justified. The great saving of time and labor by use of the charts 
should be sufficient recommendation to anyone needing to compute a 
number of tetrachoric coefficients. The charts are sufficiently 
accurate for nearly all purposes. Rarely does an error of one occur 
in the second figure. The monograph is lithographed and is thirty 
centimeters square. Jack W. DuN.ap. 

Fordham University. 


C.G. June. Modern Man in Search of a Soul. New York: Harcourt, 
Brace and Company, 1933. Pp. IX + 282. 


Reading Dr. Jung’s book is a most refreshing experience. He 
succeeds in removing from the concept of the unconscious most of 
the awe and morbidity in which it has been enfolded. His principal 
thesis is stated in his own simple and straightforward language that 
“The unconscious is not a demonic monster, but a thing of nature 
that is perfectly neutral as far as moral sense, aesthetic taste and 
intellectual judgment go. It is dangerous only when our conscious 
attitude towards it becomes hopelessly false. And this danger grows 
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in the measure that we practice repressions. But as soon as the 
patient begins to assimilate the contents that were previously uncon- 
scious, the danger from the side of the unconscious diminishes. As 
the process of assimilation goes on, it puts an end to the dissociation 
of the personality and to the anxiety that attends and inspired the 
separation of the two realms of the psyche.” 

In this clear-cut manner Dr. Jung discusses dream-analysis, the 
problems of modern psychotherapy, the aims of psychotherapy, a 
psychological theory of types, the stages of life, a contrast between 
the outlooks of Freud and Jung, the archaic man, psychology and 
literature, the postulates of analytical psychology, the spiritual 
problem of modern man, and the psychotherapists or the clergy. 

This latest volume of Dr. Jung is the most refreshing, lucid sane pre- 
sentation of psychoanalysis that this reviewer has thus far encountered. 

Carnegie Institute of Technology. Max ScHOEN. 


A. M. Jorpan. Educational Psychology (revised edition). New 
York: Henry Holt & Co., 1933. Pp. XVII + 522. 


Wiuiam A. Ketty. Educational Psychology. Milwaukee: Bruce 
Pub. Co., 1933. Pp. XIX + 501. 


Dr. Jordan’s revision of his Educational Psychology (1928) is a 
great improvement on a text that was already first-class. Two new 
chapters ‘“‘Measurement of Personality Traits” and “Maturity or 
Growth” have been added and the whole has been carefully re-written 
and brought up-to-date. It is a more workmanlike and usable book 
than the original. Dr, Jordan’s strength lies in the closeness of touch 
he keeps with schoolroom practice. He shows his readers the bearings 
of research on classroom practices. There may be educational 
psychologies showing a bigger grasp of fundamental principles than 
Dr. Jordan’s but there is none that can be so confidently recommended 
to teachers in training. 


In contrast with Dr. Jordan’s work, which sticks to objective 
and verifiable evidence, we have this new venture by Dr. Kelly, 
which adopts the viewpoint of the mediaeval schoolmen, especially 
that of St. Thomas. Not being a Roman Catholic and having been 
reared in an atmosphere of science rather than of religion, I was 
surprised when I read that ‘‘Human beings are composed of body and 
soul, made in the image and likeness of God,” and ‘The soul is a 
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simple, spiritual form substantially united to a particular body 
forming together with that body an integrated, unique personality. 
The soul is the substantial form of the body and communicates to 
the body its very subsistence.”’ In the objective tests gathered 
at the end of the book there is a true-false test and a completion test 
on “the soul.” Yet in this same text we have embodied the work of 
Gregor Mendel, researches on learning, memory and intelligence; 
a chapter on statistical methods and sixty-three pages of objective 
tests. For the audience for which it is intended it will substantiate 
the claim of the general editor, Joseph Husslein, S.J., Ph.D., who 
states: “Christian educators will welcome the appearance of an 
Educational Psychology which meets alike the demands of science and 
religion—a book including between its covers whatever is best and 
most progressive in modern texts upon this subject, yet maintains 
inviolate all the postulates, principles and high ideals of the Christian 
Faith.” Still Pike’s Peak is not 12,365 feet in height (p. 88). 
University of Toronto. P. SANDIFORD. 


M. E. Bennett with the editorial cooperation of Lewis M. Terman. 
College and Life: Problems of Self-discovery and Self-direction. 
New York: McGraw-Hill Book Co., 1933. Pp. XIV + 456. 


This volume was prepared for use as a college textbook in orienta- 
tion courses. The author considers the function of orientation courses 
to be that of group guidance. Problems of self-discovery and self- 
direction are considered under the broad headings of “Living in 
College,’ ‘‘Learning in College,’”’ and “Building a Life.”’ In detail 
these problems are expanded to include such topics as learning, 
remembering, silent reading, budgeting time, using the library, 
provoking thought, making friends, making personality analyses, 
developing personality traits, mental health, marriage, vocational 
planning, and a philosophy of life. These problems, the author 
asserts, are the ones which he has found most students eager to con- 
sider but which are dealt with only in an incidental way in the academic 
experience of students or not at all. 

The style is terse and to the point; most problems are introduced 
by direct questions. The treatment of topics is comprehensive, 
understandable, and usually based on objective evidence. The book 
appears to be ‘‘teachable”’ because of the successive presentation of 
challenges to the reader and the provision of self-analysis techniques. 
Students, now surfeited with descriptive presentations of the psy- 
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chology of self-development, will doubtless welcome this extremely 
practical book. Teachers will find in it excellent antidotes for the 
quack methods of the charlatan. 

In addition to its usefulness as a text-book in orientation courses, 
the book has merit as a supplementary text in elementary psychology, 
applied psychology, and educational guidance courses. While 
intended for use with college students, much of the material is suited 
to the needs of high school students about to enter college. 

Carnegie Institute of Technology. Guren U. CLEETON. 


D. C. Grirritus. The Psychology of Literary Appreciation. Edu- 
cational Research Series, No. 13. Melbourne: Melbourne 
University Press, 1932. Pp. 142. 


The Psychology of Literary Appreciation, by D. C. Griffiths, is 
disappointingly indefinite. The reason, however, lies not so much 
in any lack on the part of the writer as in the inadequacy of the factual 
material upon which it is possible for him to draw. As the author 
points out, practically nothing of an objective nature has been done 
in the field of appreciation of literature, the reason being two-fold: 
(1) The firm belief of people engaged in literary pursuits that both 
appreciation of and creativity in the arts are sacrosanct; and (2) 
The natural hesitancy of the psychologist to attack a problem which 
promises so little in pragmatic returns. 

The contribution that Mr. Griffiths makes in this monograph is to 
focus attention upon the importance of making a scientific approach to 
the whole question of esthetics, especially to problems concerning the 
relationship of esthetics to education. He is, however, a little too 
awed by the immensity of the task and by the multiplicity of possi- 
bilities for failure. This attitude blurs his thinking and keeps him 
from sighting clear issues and making definite recommendations. 
There is needed in esthetics the same kind of dynamic and incisive 
approach that was made by Binet, Terman, and Thorndike in their 
work on the measurement of intelligence. Hrrsert A. CARROLL. 

University of Minnesota. 


N. Norswortny, and M. T. Wuituey. The Psychology of Childhood. 
Revised Edition. New York: The Macmillan Co., 1933. Pp. 
XVII + 515. 


This well-known textbook in genetic psychology has undergone a 
revision in which the general plan and organization remain much the 
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same as in the earlier edition, while the content has been brought up 
to date by the inclusion of the results of the major researches that 
have appeared in recent years in the field of child psychology. The 
changes in emphasis and in terminology reflect rather clearly the 
trend of psychology in the intervening years. The sections on learning 
have been elaborated; instincts have practically disappeared, as has 
also the term “tendencies.’”’ Corresponding to the waning of the 
heredity-environment controversy, the chapters on original nature 
have been greatly condensed and finally the chapter on sense per- 
ception has been omitted. New chapters appear on ‘Language 
Development,” ‘‘Misdirected Tendencies,” ‘Learning about the 
Physical Environment” and “Interpretations of Child Development.” 

In contrast to other recent texts in the field, the emphasis is 
on the development of the school-age child and little attention is 
paid to the important developmental changes of infancy. The book 
is somewhat lacking in illustrative material giving intimate glimpses 
of child life. Its unique contribution appears to be two chapters 
devoted to considerations of moral and religious aspects of child 
development that are rarely more than mentioned in books of this 


type. A glossary is appended as well as a bibliography of three 


hundred fifty-seven references. DorotTHea McCartnay. 
Fordham University. 


WituiamM G. Batuantine, The Logic of Science. Thomas Y. Crowell 
Company, 1933. Pp. 230. 


There are better books on the nature and objectives of science 
than this one. Yet, Mr. Ballantine’s work is not without value. 
It is a good antidote for the lay reader against the obscurantisms of 
such scientific theologians as Eddington, Jeans, Millikan, and the 
puerilities of Abbé Dimnet on the art of thought. The author points 
out, in a concise clear manner, that the logic of science is the sole 
true logic, in that it is only the method of science that leads to depend- 
able knowledge. Max ScHOEN. 

Carnegie Institute of Technology. 


J. J. Haper and E. C. Linpeman. Dynamic Social Research. New 
York: Harcourt, Brace & Co., 1933. Pp. X + 231. 


Although this work is primarily directed to social workers, econ- 
omists and sociologists, the techniques worked out are valid for 
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psychologists working in the personnel field. It is one of the best 
constructed books that has come to my desk these many months. 
The argument marches along steadily with never a deviation into 
useless side-paths. The social problem the authors study is that of 
employee representation in industrial management. After dealing 
with the situation from the historical standpoint, the writers proceed 
to develop their social philosophy, then criticise acutely social method- 
ology, and finally expound the correct methods for use in interviewing, 
observing, case analysis, charting and statistical analysis, the whole 
making a compelling case for regarding these techniques as essentially 
scientific. P. SANDIFORD. 
University of Toronto. 


Hitpa Tapa. The Dynamics of Education: A Methodology of Pro- 
gressive Educational Thought. New York: Harcourt, Brace & 
Co., 1932. Pp. XVI + 278. 


Thirty years ago I read and discussed in class Dewey’s “Child and 
the Curriculum,” one of the best essays that Dewey ever wrote. 
Dr. Taba’s “Dynamics of Education” is essentially an expanded 
“Child and the Curriculum.” During the three decades that have 
elapsed between these volumes, much water has flowed under the 
educational bridge. The period has seen the rise of the reaction 
hypothesis, behaviorism, gestalt psychology, the child-centered school, 
job analysis and a host of other things. Every one of these movements 
is severely criticised by Dr. Taba and found wanting. In fact the 
reading of ‘‘ Dynamics of Education”’ is a most humbling experience. 
Everybody has been wrong. None of us apparently realised that 
learning is synthetic, not an analytic process—a dynamic recon- 
struction of human experience through purposive behavior. The 
three essential elements in the process are (1) the learning materials 
(environmental stimulation); (2) the learner (with his nature, abilities 
and interests); and (3) the structure, form and sequences of the 
process of learning with its results. Apparently educators have seen 
elements (1) and (2) pretty clearly, but they’ve overlooked (3) alto- 
gether and in strict philosophical terminology Dr. Taba brings them to 
task. Yet although Dr. Taba describes her book as a methodology 
of progressive educational thought, the reader will be disappointed 
if he looks for cut and dried plans for achieving the new educational 
heaven. For example, in discussing aims of education we are told 
that we must create ‘“‘conditions rich in media and materials for 
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significant and educative experiences, from which further aims may 
outgrow.” Agreed, but it would have been helpful to the less phil- 
osophically minded among us to get a few practical pointers. Perhaps, 
some kind teacher will take the book and show us what it all means in 
practice. 

Dr. Kilpatrick writes an illuminating introduction and must have 
had many qualms of conscience as he remembered his lapse from grace 
when he wrote his very useful text “‘The Foundations of Method.” 

University of Toronto. PETER SANDIFORD. 
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